• Source: Bregman method
    • The Bregman method is an iterative algorithm to solve certain convex optimization problems involving regularization. The original version is due to Lev M. Bregman, who published it in 1967.
      The algorithm is a row-action method accessing constraint functions one by one and the method is particularly suited for large optimization problems where constraints can be efficiently enumerated. The algorithm works particularly well for regularizers such as the






      1




      {\displaystyle \ell _{1}}

      norm, where it converges very quickly because of an error-cancellation effect.


      Algorithm


      In order to be able to use the Bregman method, one must frame the problem of interest as finding




      min

      u


      J
      (
      u
      )
      +
      f
      (
      u
      )


      {\displaystyle \min _{u}J(u)+f(u)}

      , where



      J


      {\displaystyle J}

      is a regularizing function such as






      1




      {\displaystyle \ell _{1}}

      .
      The Bregman distance is defined as




      D

      p


      (
      u
      ,
      v
      )
      :=
      J
      (
      u
      )

      (
      J
      (
      v
      )
      +

      p
      ,
      u

      v

      )


      {\displaystyle D^{p}(u,v):=J(u)-(J(v)+\langle p,u-v\rangle )}

      where



      p


      {\displaystyle p}

      belongs to the subdifferential of



      J


      {\displaystyle J}

      at



      u


      {\displaystyle u}

      (which we denoted




      J
      (
      u
      )


      {\displaystyle \partial J(u)}

      ). One performs the iteration




      u

      k
      +
      1


      :=

      min

      u


      (
      α
      D
      (
      u
      ,

      u

      k


      )
      +
      f
      (
      u
      )
      )


      {\displaystyle u_{k+1}:=\min _{u}(\alpha D(u,u_{k})+f(u))}

      , with



      α


      {\displaystyle \alpha }

      a constant to be chosen by the user (and the minimization performed by an ordinary convex optimization algorithm), or




      u

      k
      +
      1


      :=

      min

      u


      (

      D


      p

      k




      (
      u
      ,

      u

      k


      )
      +
      f
      (
      u
      )
      )


      {\displaystyle u_{k+1}:=\min _{u}(D^{p_{k}}(u,u_{k})+f(u))}

      , with




      p

      k




      {\displaystyle p_{k}}

      chosen each time to be a member of




      J
      (

      u

      k


      )


      {\displaystyle \partial J(u_{k})}

      .
      The algorithm starts with a pair of primal and dual variables. Then, for each constraint a generalized projection onto its feasible set is performed, updating both the constraint's dual variable and all primal variables for which there are non-zero coefficients in the constraint functions gradient. In case the objective is strictly convex and all constraint functions are convex, the limit of this iterative projection converges to the optimal primal dual pair.
      In the case of a basis pursuit-type problem




      min

      x
      :
      A
      x
      =
      b


      (

      |

      x


      |


      1


      +


      1

      2
      α




      |

      x


      |


      2


      2


      )


      {\displaystyle \min _{x:Ax=b}(|x|_{1}+{\frac {1}{2\alpha }}|x|_{2}^{2})}

      , the Bregman method is equivalent to ordinary gradient descent on the dual problem




      min

      y


      (


      b

      t


      y
      +


      α
      2



      |


      A

      t


      y



      Proj


      [

      1
      ,
      1

      ]

      n




      (

      A

      t


      y
      )


      |


      2


      )


      {\displaystyle \min _{y}(-b^{t}y+{\frac {\alpha }{2}}|A^{t}y-{\text{Proj}}_{[-1,1]^{n}}(A^{t}y)|^{2})}

      . An exact regularization-type effect also occurs in this case; if



      α


      {\displaystyle \alpha }

      exceeds a certain threshold, the optimum value of



      x


      {\displaystyle x}

      is precisely the optimum solution of




      min

      x
      :
      A
      x
      =
      b



      |

      x


      |


      1




      {\displaystyle \min _{x:Ax=b}|x|_{1}}

      .


      Applications


      The Bregman method or its generalizations can be applied to:

      Image deblurring or denoising (including total variation denoising)
      MR image reconstruction
      Magnetic resonance imaging
      Radar
      Hyperspectral imaging
      Compressed sensing
      Least absolute deviations or






      1




      {\displaystyle \ell _{1}}

      -regularized linear regression
      Covariance selection (learning a sparse covariance matrix)
      Matrix completion
      Structural risk minimization


      Generalizations and drawbacks


      The method has links to the method of multipliers and dual ascent method (through the so-called Bregman alternating direction method of multipliers, generalizing the alternating direction method of multipliers) and multiple generalizations exist.
      One drawback of the method is that it is only provably convergent if the objective function is strictly convex. In case this can not be ensured, as for linear programs or non-strictly convex quadratic programs, additional methods such as proximal gradient methods have been developed. In the case of the Rudin-Osher-Fatemi model of image denoising, the Bregman method provably converges.
      Some generalizations of the Bregman method include:

      Inverse scale space method
      Linearized Bregman
      Logistic Bregman
      Split Bregman


      = Linearized Bregman

      =
      In the Linearized Bregman method, one linearizes the intermediate objective functions




      D

      p


      (
      u
      ,

      u

      k


      )
      +
      f
      (
      u
      )


      {\displaystyle D^{p}(u,u_{k})+f(u)}

      by replacing the second term with



      f
      (

      u

      k


      )
      +


      f


      (

      u

      k


      )
      ,
      u


      u

      k





      {\displaystyle f(u_{k})+\langle f'(u_{k}),u-u_{k}\rangle }

      (which approximates the second term near




      u

      k




      {\displaystyle u_{k}}

      ) and adding the penalty term





      1

      2
      δ




      |

      u


      u

      k




      |


      2


      2




      {\displaystyle {\frac {1}{2\delta }}|u-u_{k}|_{2}^{2}}

      for a constant



      δ


      {\displaystyle \delta }

      . The result is much more computationally tractable, especially in basis pursuit-type problems. In the case of a generic basis pursuit problem



      min
      μ

      |

      u


      |


      1


      +


      1
      2



      |

      A
      u

      f


      |


      2


      2




      {\displaystyle \min \mu |u|_{1}+{\frac {1}{2}}|Au-f|_{2}^{2}}

      , one can express the iteration as




      v

      k
      +
      1


      :=

      v

      k


      +

      A

      t


      (
      f

      A

      u

      k


      )
      ,

      u

      k
      +
      1
      ,
      i


      :=
      δ


      shrink

      (

      v

      k
      ,
      i


      ,
      μ
      )


      {\displaystyle v_{k+1}:=v_{k}+A^{t}(f-Au_{k}),u_{k+1,i}:=\delta ~{\text{shrink}}(v_{k,i},\mu )}

      for each component



      i


      {\displaystyle i}

      , where we define




      shrink

      (
      y
      ,
      a
      )
      :=


      {



      y

      a
      ,


      y

      (
      a
      ,

      )




      0
      ,


      y

      [

      a
      ,
      a
      ]




      y
      +
      a
      ,


      y

      (


      ,

      a
      )








      {\displaystyle {\text{shrink}}(y,a):={\begin{cases}y-a,&y\in (a,\infty )\\0,&y\in [-a,a]\\y+a,&y\in (-\infty ,-a)\end{cases}}}

      .
      Sometimes, when running the Linearized Bregman method, there are periods of "stagnation" where the residual is almost constant. To alleviate this issue, one can use the Linearized Bregman method with kicking, where one essentially detects the beginning of a stagnation period, then predicts and skips to the end of it.
      Since Linearized Bregman is mathematically equivalent to gradient descent, it can be accelerated with methods to accelerate gradient descent, such as line search, L-BGFS, Barzilai-Borwein steps, or the Nesterov method; the last has been proposed as the accelerated linearized Bregman method.


      = Split Bregman

      =
      The Split Bregman method solves problems of the form




      min

      u



      |

      Φ
      (
      u
      )


      |


      1


      +
      H
      (
      u
      )


      {\displaystyle \min _{u}|\Phi (u)|_{1}+H(u)}

      , where



      Φ


      {\displaystyle \Phi }

      and



      H


      {\displaystyle H}

      are both convex, particularly problems of the form




      min

      u



      |

      Φ
      u


      |


      1


      +

      |

      K
      u

      f


      |


      2




      {\displaystyle \min _{u}|\Phi u|_{1}+|Ku-f|^{2}}

      . We start by rewriting it as the constrained optimization problem




      min

      u
      :
      d
      =
      Φ
      (
      u
      )



      |

      d


      |


      1


      +
      H
      (
      u
      )


      {\displaystyle \min _{u:d=\Phi (u)}|d|_{1}+H(u)}

      , then relax it into




      min

      u
      ,
      d



      |

      d


      |


      1


      +
      H
      (
      u
      )
      +


      λ
      2



      |

      d

      Φ
      (
      u
      )


      |


      2


      2




      {\displaystyle \min _{u,d}|d|_{1}+H(u)+{\frac {\lambda }{2}}|d-\Phi (u)|_{2}^{2}}

      where



      λ


      {\displaystyle \lambda }

      is a constant. By defining



      J
      (
      u
      ,
      d
      )
      :=

      |

      d

      |

      +
      H
      (
      u
      )


      {\displaystyle J(u,d):=|d|+H(u)}

      , one reduces the problem to one that can be solved with the ordinary Bregman algorithm.
      The Split Bregman method has been generalized to optimization over complex numbers using Wirtinger derivatives.


      References

    Kata Kunci Pencarian: