Differential dynamic programming GudangMovies21 Rebahinxxi LK21

      Differential dynamic programming (DDP) is an optimal control algorithm of the trajectory optimization class. The algorithm was introduced in 1966 by Mayne and subsequently analysed in Jacobson and Mayne's eponymous book. The algorithm uses locally-quadratic models of the dynamics and cost functions, and displays quadratic convergence. It is closely related to Pantoja's step-wise Newton's method.


      Finite-horizon discrete-time problems


      The dynamics

      describe the evolution of the state





      x




      {\displaystyle \textstyle \mathbf {x} }

      given the control




      u



      {\displaystyle \mathbf {u} }

      from time



      i


      {\displaystyle i}

      to time



      i
      +
      1


      {\displaystyle i+1}

      . The total cost




      J

      0




      {\displaystyle J_{0}}

      is the sum of running costs








      {\displaystyle \textstyle \ell }

      and final cost






      f




      {\displaystyle \ell _{f}}

      , incurred when starting from state




      x



      {\displaystyle \mathbf {x} }

      and applying the control sequence




      U


      {


      u


      0


      ,


      u


      1



      ,


      u


      N

      1


      }


      {\displaystyle \mathbf {U} \equiv \{\mathbf {u} _{0},\mathbf {u} _{1}\dots ,\mathbf {u} _{N-1}\}}

      until the horizon is reached:





      J

      0


      (

      x

      ,

      U

      )
      =



      i
      =
      0


      N

      1



      (


      x


      i


      ,


      u


      i


      )
      +



      f


      (


      x


      N


      )
      ,


      {\displaystyle J_{0}(\mathbf {x} ,\mathbf {U} )=\sum _{i=0}^{N-1}\ell (\mathbf {x} _{i},\mathbf {u} _{i})+\ell _{f}(\mathbf {x} _{N}),}


      where





      x


      0




      x



      {\displaystyle \mathbf {x} _{0}\equiv \mathbf {x} }

      , and the





      x


      i




      {\displaystyle \mathbf {x} _{i}}

      for



      i
      >
      0


      {\displaystyle i>0}

      are given by Eq. 1. The solution of the optimal control problem is the minimizing control sequence






      U





      (

      x

      )


      argmin


      U





      J

      0


      (

      x

      ,

      U

      )
      .


      {\displaystyle \mathbf {U} ^{*}(\mathbf {x} )\equiv \operatorname {argmin} _{\mathbf {U} }J_{0}(\mathbf {x} ,\mathbf {U} ).}


      Trajectory optimization means finding





      U





      (

      x

      )


      {\displaystyle \mathbf {U} ^{*}(\mathbf {x} )}

      for a particular





      x


      0




      {\displaystyle \mathbf {x} _{0}}

      , rather than for all possible initial states.


      Dynamic programming


      Let





      U


      i




      {\displaystyle \mathbf {U} _{i}}

      be the partial control sequence





      U


      i



      {


      u


      i


      ,


      u


      i
      +
      1



      ,


      u


      N

      1


      }


      {\displaystyle \mathbf {U} _{i}\equiv \{\mathbf {u} _{i},\mathbf {u} _{i+1}\dots ,\mathbf {u} _{N-1}\}}

      and define the cost-to-go




      J

      i




      {\displaystyle J_{i}}

      as the partial sum of costs from



      i


      {\displaystyle i}

      to



      N


      {\displaystyle N}

      :





      J

      i


      (

      x

      ,


      U


      i


      )
      =



      j
      =
      i


      N

      1



      (


      x


      j


      ,


      u


      j


      )
      +



      f


      (


      x


      N


      )
      .


      {\displaystyle J_{i}(\mathbf {x} ,\mathbf {U} _{i})=\sum _{j=i}^{N-1}\ell (\mathbf {x} _{j},\mathbf {u} _{j})+\ell _{f}(\mathbf {x} _{N}).}


      The optimal cost-to-go or value function at time



      i


      {\displaystyle i}

      is the cost-to-go given the minimizing control sequence:




      V
      (

      x

      ,
      i
      )


      min



      U


      i





      J

      i


      (

      x

      ,


      U


      i


      )
      .


      {\displaystyle V(\mathbf {x} ,i)\equiv \min _{\mathbf {U} _{i}}J_{i}(\mathbf {x} ,\mathbf {U} _{i}).}


      Setting



      V
      (

      x

      ,
      N
      )




      f


      (


      x


      N


      )


      {\displaystyle V(\mathbf {x} ,N)\equiv \ell _{f}(\mathbf {x} _{N})}

      , the dynamic programming principle reduces the minimization over an entire sequence of controls to a sequence of minimizations over a single control, proceeding backwards in time:

      This is the Bellman equation.


      Differential dynamic programming


      DDP proceeds by iteratively performing a backward pass on the nominal trajectory to generate a new control sequence, and then a forward-pass to compute and evaluate a new nominal trajectory. We begin with the backward pass. If





      (

      x

      ,

      u

      )
      +
      V
      (

      f

      (

      x

      ,

      u

      )
      ,
      i
      +
      1
      )


      {\displaystyle \ell (\mathbf {x} ,\mathbf {u} )+V(\mathbf {f} (\mathbf {x} ,\mathbf {u} ),i+1)}


      is the argument of the



      min
      [

      ]


      {\displaystyle \min[\cdot ]}

      operator in Eq. 2, let



      Q


      {\displaystyle Q}

      be the variation of this quantity around the



      i


      {\displaystyle i}

      -th



      (

      x

      ,

      u

      )


      {\displaystyle (\mathbf {x} ,\mathbf {u} )}

      pair:








      Q
      (
      δ

      x

      ,
      δ

      u

      )




      (

      x

      +
      δ

      x

      ,

      u

      +
      δ

      u

      )







      +
      V
      (

      f

      (

      x

      +
      δ

      x

      ,

      u

      +
      δ

      u

      )
      ,
      i
      +
      1
      )








      (

      x

      ,

      u

      )








      V
      (

      f

      (

      x

      ,

      u

      )
      ,
      i
      +
      1
      )






      {\displaystyle {\begin{aligned}Q(\delta \mathbf {x} ,\delta \mathbf {u} )\equiv &\ell (\mathbf {x} +\delta \mathbf {x} ,\mathbf {u} +\delta \mathbf {u} )&&{}+V(\mathbf {f} (\mathbf {x} +\delta \mathbf {x} ,\mathbf {u} +\delta \mathbf {u} ),i+1)\\-&\ell (\mathbf {x} ,\mathbf {u} )&&{}-V(\mathbf {f} (\mathbf {x} ,\mathbf {u} ),i+1)\end{aligned}}}


      and expand to second order

      The



      Q


      {\displaystyle Q}

      notation used here is a variant of the notation of Morimoto where subscripts denote differentiation in denominator layout.
      Dropping the index



      i


      {\displaystyle i}

      for readability, primes denoting the next time-step




      V



      V
      (
      i
      +
      1
      )


      {\displaystyle V'\equiv V(i+1)}

      , the expansion coefficients are









      Q


      x






      =




      x



      +


      f



      x




      T




      V


      x









      Q


      u






      =




      u



      +


      f



      u




      T




      V


      x









      Q


      x


      x






      =




      x


      x



      +


      f



      x




      T




      V


      x


      x






      f



      x



      +

      V


      x







      f



      x


      x








      Q


      u


      u






      =




      u


      u



      +


      f



      u




      T




      V


      x


      x






      f



      u



      +


      V


      x








      f



      u


      u








      Q


      u


      x






      =




      u


      x



      +


      f



      u




      T




      V


      x


      x






      f



      x



      +


      V


      x








      f



      u


      x



      .






      {\displaystyle {\begin{alignedat}{2}Q_{\mathbf {x} }&=\ell _{\mathbf {x} }+\mathbf {f} _{\mathbf {x} }^{\mathsf {T}}V'_{\mathbf {x} }\\Q_{\mathbf {u} }&=\ell _{\mathbf {u} }+\mathbf {f} _{\mathbf {u} }^{\mathsf {T}}V'_{\mathbf {x} }\\Q_{\mathbf {x} \mathbf {x} }&=\ell _{\mathbf {x} \mathbf {x} }+\mathbf {f} _{\mathbf {x} }^{\mathsf {T}}V'_{\mathbf {x} \mathbf {x} }\mathbf {f} _{\mathbf {x} }+V_{\mathbf {x} }'\cdot \mathbf {f} _{\mathbf {x} \mathbf {x} }\\Q_{\mathbf {u} \mathbf {u} }&=\ell _{\mathbf {u} \mathbf {u} }+\mathbf {f} _{\mathbf {u} }^{\mathsf {T}}V'_{\mathbf {x} \mathbf {x} }\mathbf {f} _{\mathbf {u} }+{V'_{\mathbf {x} }}\cdot \mathbf {f} _{\mathbf {u} \mathbf {u} }\\Q_{\mathbf {u} \mathbf {x} }&=\ell _{\mathbf {u} \mathbf {x} }+\mathbf {f} _{\mathbf {u} }^{\mathsf {T}}V'_{\mathbf {x} \mathbf {x} }\mathbf {f} _{\mathbf {x} }+{V'_{\mathbf {x} }}\cdot \mathbf {f} _{\mathbf {u} \mathbf {x} }.\end{alignedat}}}


      The last terms in the last three equations denote contraction of a vector with a tensor. Minimizing the quadratic approximation (3) with respect to



      δ

      u



      {\displaystyle \delta \mathbf {u} }

      we have

      giving an open-loop term




      k

      =


      Q


      u


      u




      1



      Q


      u





      {\displaystyle \mathbf {k} =-Q_{\mathbf {u} \mathbf {u} }^{-1}Q_{\mathbf {u} }}

      and a feedback gain term




      K

      =


      Q


      u


      u




      1



      Q


      u


      x





      {\displaystyle \mathbf {K} =-Q_{\mathbf {u} \mathbf {u} }^{-1}Q_{\mathbf {u} \mathbf {x} }}

      . Plugging the result back into (3), we now have a quadratic model of the value at time



      i


      {\displaystyle i}

      :








      Δ
      V
      (
      i
      )



      =









      1
      2




      Q


      u



      T



      Q


      u


      u




      1



      Q


      u








      V


      x



      (
      i
      )



      =

      Q


      x










      Q


      x
      u




      Q


      u


      u




      1



      Q


      u








      V


      x


      x



      (
      i
      )



      =

      Q


      x


      x










      Q


      x


      u




      Q


      u


      u




      1



      Q


      u


      x



      .






      {\displaystyle {\begin{alignedat}{2}\Delta V(i)&=&{}-{\tfrac {1}{2}}Q_{\mathbf {u} }^{T}Q_{\mathbf {u} \mathbf {u} }^{-1}Q_{\mathbf {u} }\\V_{\mathbf {x} }(i)&=Q_{\mathbf {x} }&{}-Q_{\mathbf {xu} }Q_{\mathbf {u} \mathbf {u} }^{-1}Q_{\mathbf {u} }\\V_{\mathbf {x} \mathbf {x} }(i)&=Q_{\mathbf {x} \mathbf {x} }&{}-Q_{\mathbf {x} \mathbf {u} }Q_{\mathbf {u} \mathbf {u} }^{-1}Q_{\mathbf {u} \mathbf {x} }.\end{alignedat}}}


      Recursively computing the local quadratic models of



      V
      (
      i
      )


      {\displaystyle V(i)}

      and the control modifications



      {

      k

      (
      i
      )
      ,

      K

      (
      i
      )
      }


      {\displaystyle \{\mathbf {k} (i),\mathbf {K} (i)\}}

      , from



      i
      =
      N

      1


      {\displaystyle i=N-1}

      down to



      i
      =
      1


      {\displaystyle i=1}

      , constitutes the backward pass. As above, the Value is initialized with



      V
      (

      x

      ,
      N
      )




      f


      (


      x


      N


      )


      {\displaystyle V(\mathbf {x} ,N)\equiv \ell _{f}(\mathbf {x} _{N})}

      . Once the backward pass is completed, a forward pass computes a new trajectory:












      x

      ^



      (
      1
      )



      =

      x

      (
      1
      )








      u

      ^



      (
      i
      )



      =

      u

      (
      i
      )
      +

      k

      (
      i
      )
      +

      K

      (
      i
      )
      (




      x

      ^



      (
      i
      )


      x

      (
      i
      )
      )








      x

      ^



      (
      i
      +
      1
      )



      =

      f

      (




      x

      ^



      (
      i
      )
      ,




      u

      ^



      (
      i
      )
      )






      {\displaystyle {\begin{aligned}{\hat {\mathbf {x} }}(1)&=\mathbf {x} (1)\\{\hat {\mathbf {u} }}(i)&=\mathbf {u} (i)+\mathbf {k} (i)+\mathbf {K} (i)({\hat {\mathbf {x} }}(i)-\mathbf {x} (i))\\{\hat {\mathbf {x} }}(i+1)&=\mathbf {f} ({\hat {\mathbf {x} }}(i),{\hat {\mathbf {u} }}(i))\end{aligned}}}


      The backward passes and forward passes are iterated until convergence.


      Regularization and line-search


      Differential dynamic programming is a second-order algorithm like Newton's method. It therefore takes large steps toward the minimum and often requires regularization and/or line-search to achieve convergence. Regularization in the DDP context means ensuring that the




      Q


      u


      u





      {\displaystyle Q_{\mathbf {u} \mathbf {u} }}

      matrix in Eq. 4 is positive definite. Line-search in DDP amounts to scaling the open-loop control modification




      k



      {\displaystyle \mathbf {k} }

      by some



      0
      <
      α
      <
      1


      {\displaystyle 0<\alpha <1}

      .


      Monte Carlo version


      Sampled differential dynamic programming (SaDDP) is a Monte Carlo variant of differential dynamic programming. It is based on treating the quadratic cost of differential dynamic programming as the energy of a Boltzmann distribution. This way the quantities of DDP can be matched to the statistics of a multidimensional normal distribution. The statistics can be recomputed from sampled trajectories without differentiation.
      Sampled differential dynamic programming has been extended to Path Integral Policy Improvement with Differential Dynamic Programming. This creates a link between differential dynamic programming and path integral control, which is a framework of stochastic optimal control.


      Constrained problems


      Interior Point Differential dynamic programming (IPDDP) is an interior-point method generalization of DDP that can address the optimal control problem with nonlinear state and input constraints.


      See also


      Optimal control


      References




      External links


      A Python implementation of DDP
      A MATLAB implementation of DDP

    Kata Kunci Pencarian:

    differential dynamic programmingdifferential dynamic programming with nonlinear constraintsdifferential dynamic programming tutorialdifferential dynamic programming for multi phase rigid contact dynamicsdifferential dynamic programming pythondifferential dynamic programming matlabdifferential dynamic programming maynedifferential dynamic programming for optimal estimationdifferential dynamic programming pdfdifferential dynamic programming with nonlinear safety constraints under system uncertainties
    GitHub - Numerical-Optimization-Research/Differential-Dynamic-Programming

    GitHub - Numerical-Optimization-Research/Differential-Dynamic-Programming

    GitHub - baggepinnen/DifferentialDynamicProgramming.jl: A package for ...

    GitHub - baggepinnen/DifferentialDynamicProgramming.jl: A package for ...

    Differential Dynamic Programming with Nonlinear Constraints | The ...

    Differential Dynamic Programming with Nonlinear Constraints | The ...

    Distributed Differential Dynamic Programming Architectures for Large ...

    Distributed Differential Dynamic Programming Architectures for Large ...

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential dynamic programming | Semantic Scholar

    Differential Dynamic Programming book : ControlTheory

    Differential Dynamic Programming book : ControlTheory

    DIRECT: A Differential Dynamic Programming Based Framework for ...

    DIRECT: A Differential Dynamic Programming Based Framework for ...

    (PDF) Differential Dynamic Programming for Nonlinear Dynamic Games

    (PDF) Differential Dynamic Programming for Nonlinear Dynamic Games