(六)ILQR正则化和line search
1. ILQR正则化
在iLQR中,我们通常线性化系统动力学并对目标函数进行二阶近似。在反向传播步骤中,我们需要计算逆矩阵(控制变量对目标函数的二阶导数矩阵),用以更新控制增量。但在某些情况下,可能是奇异的或者接近奇异,导致逆矩阵的计算不稳定。为了避免这种问题,可以通过加入正则化项,使得变得更具条件性。
方法1:基于梯度下降的正则化
一种常见的正则化方法是在上加上一个对角项,这类似于梯度下降中的学习率调节。
令正则化矩阵为,其中是单位矩阵,是正则化系数。我们可以将调整为:
通过这种方式,即便是非正定或接近奇异的,我们仍然可以通过计算 的逆矩阵来进行控制更新。
方法2:信赖域正则化(Trust-Region Regularization)
另一种正则化方法是基于信赖域(trust region)优化的思想。我们在优化控制增量时限制其大小。具体而言,控制增量的大小受限于一个固定范围:
这种方法通过限制更新步长来确保每一步的更新都不会过大,从而提高优化的鲁棒性。
正则化对算法的影响
收敛性: 适当的正则化能够提高 DDP 收敛的稳定性,但过大的正则化参数可能会减慢算法的收敛速度。
步长控制: 一些算法中,正则化系数会根据当前迭代的效果动态调整。若迭代效果较好,减小以加快收敛;若效果较差,则增大以确保稳定性。
2. 线搜索(Line Search)
在 iLQR中,线搜索(Line Search)是一种常见的技巧,用于在更新控制序列时进一步提高算法的数值稳定性和收敛性。在 iLQR 的反向传播阶段,通过局部线性化的系统模型和二次近似的目标函数来计算新的控制增量。然而,由于近似并不精确,直接应用所得到的控制更新可能会导致目标函数值的增加,尤其是当系统具有高度非线性时。为了防止目标值恶化,线搜索通过控制步长来确保新的控制输入能带来实际的性能提升。
线搜索的基本步骤
线搜索的核心思想是在一条给定的方向上缩小步长,直到找到一个合适的步长,使得目标函数有所改善。以下是具体的步骤:
-
初始控制增量: 通过反向传播步骤得到一个控制增量,假设控制更新为:
其中是步长系数(初始设定为 1)。
-
目标函数改善判定: 对于每一个,我们进行前向传播以计算新的目标函数值。比较更新前后的目标函数值:
如果目标函数值降低,则可以接受当前的步长并更新控制序列。
-
缩小步长: 如果新的控制增量导致目标函数值变差,即,则缩小步长,例如设为初始值的一半(或其他比例),即,其中,继续计算目标函数值,直到找到合适的步长。
-
重复迭代: 重复前向模拟和目标函数比较,直到找到合适的 α\alphaα 或步长缩小到某个阈值以下。
3. python代码演示
import numpy as np# System dynamics (nonlinear)
def system_dynamics(x, u):return x + np.sin(u)# Cost function for a single step
def cost_function(x, u):return 0.5 * (x**2 + u**2)# Derivative of the cost function w.r.t. control input u (l_u)
def cost_u(u):return u# Derivative of the cost function w.r.t. state x (l_x)
def cost_x(x):return x# Second derivative of the cost function w.r.t. control input u (l_uu)
def cost_uu():return 1# Second derivative of the cost function w.r.t. state x (l_xx)
def cost_xx():return 1# Terminal cost for the final state
def terminal_cost(x):return 0.5 * x**2# Derivative of terminal cost w.r.t state (l_x at final step)
def terminal_cost_x(x):return x# Function to calculate the initial state trajectory based on control sequence
def compute_initial_trajectory(x0, u):x = np.zeros(N+1)x[0] = x0for k in range(N):x[k+1] = system_dynamics(x[k], u[k])return x# iLQR algorithm with gradient descent, trust-region regularization, and line search
def ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size, line_search_steps=10):x = compute_initial_trajectory(x0, u) # Compute initial trajectoryprev_cost = np.infregularization = 1e-3 # Regularization term for gradient descent (small to avoid overfitting)for i in range(iterations):# Backward passV_x = np.zeros(N+1)V_xx = np.zeros(N+1)V_x[-1] = terminal_cost_x(x[-1]) # Terminal value for V_xV_xx[-1] = 1 # Terminal value for V_xx (quadratic cost on terminal state)du = np.zeros(N) # Control updates# Backward pass: compute Q function and control updatefor k in range(N-1, -1, -1):# Compute Q-function termsf_u = np.cos(u[k]) # Derivative of system dynamics w.r.t. uQ_u = cost_u(u[k]) + f_u * V_x[k+1] # Q_u = l_u + f_u^T * V_x(k+1)Q_uu = cost_uu() + f_u**2 * V_xx[k+1] + regularization # Add regularization to avoid singularityQ_x = cost_x(x[k]) + V_x[k+1] # Q_x = l_x + f_x^T * V_x(k+1)Q_xx = cost_xx() + V_xx[k+1] # Q_xx = l_xx + f_x^T * V_xx(k+1) * f_x# Update control input with trust-region regularizationstep = -Q_u / Q_uu # Compute step size for control updatestep = np.clip(step, -max_step_size, max_step_size) # Apply trust-region constraint (step size regularization)du[k] = stepV_x[k] = Q_x + Q_uu * du[k] # Update value function gradientV_xx[k] = Q_xx # Update value function Hessian (V_xx)# Forward pass: update trajectory using the new control inputs and line searchx_new = np.zeros(N+1)u_new = np.zeros(N)x_new[0] = x0alpha = 1.0 # Starting step size for line searchfor search_step in range(line_search_steps):for k in range(N):u_new[k] = u[k] + alpha * du[k] # Update control with line search stepx_new[k+1] = system_dynamics(x_new[k], u_new[k]) # Update state# Compute the total cost for the current trajectorycurrent_cost = np.sum([cost_function(x_new[k], u_new[k]) for k in range(N)]) + terminal_cost(x_new[-1])# Check if this step size improves the costif current_cost < prev_cost:breakalpha *= 0.5 # Reduce step size if no improvement# 1. Stop based on control input changeif np.max(np.abs(du)) < epsilon_u:print(f"Stopped due to control input convergence at iteration {i}")break# 2. Stop based on cost function change (relative)if np.abs(prev_cost - current_cost) / np.abs(prev_cost) < epsilon_J:print(f"Stopped due to cost function convergence at iteration {i}")break# 3. Stop based on state trajectory changeif np.max(np.abs(x_new - x)) < epsilon_x:print(f"Stopped due to state trajectory convergence at iteration {i}")break# Update for next iterationx = x_newu = u_newprev_cost = current_costelse:print("Reached maximum number of iterations")return x, u, iif __name__ == "__main__":# iLQR parametersN = 3 # Number of time stepsx0 = 1 # Initial stateiterations = 50 # Maximum number of iterationsepsilon_u = 1e-3 # Tolerance for control input changesepsilon_J = 1e-4 # Tolerance for cost function change (relative)epsilon_x = 1e-4 # Tolerance for state trajectory changemax_step_size = 1.0 # Trust region step size for control updates# Initialize control sequence and state trajectoryu = np.zeros(N) # Initial control sequencex = np.zeros(N+1) # State trajectoryx[0] = x0# Compute initial trajectoryx_initial = compute_initial_trajectory(x0, u)# Run iLQR with stopping conditionsx_final, u_final, num_iterations = ilqr_with_regularization_and_linesearch(x0, u, iterations, epsilon_u, epsilon_J, epsilon_x, max_step_size)# Output the final results and number of iterationsprint("Final state trajectory: ", x_final)print("Final control inputs: ", u_final)print("Total iterations: ", num_iterations)
输出结果如下:
Stopped due to cost function convergence at iteration 8
Final state trajectory: [1. 0.44370578 0.17737126 0.0834065 ]
Final control inputs: [-0.58991961 -0.26958819 -0.09410359]
Total iterations: 8