- Source: Mirror descent
- Pemelajaran mesin daring
- Perang Dunia I
- Sunny Leone
- Kuil Kamo
- Ras manusia
- Daftar episode Bleach
- Nu metal
- Daftar film Amerika tahun 2005
- Titan (satelit)
- Aktris Terbaik (Saturn Award)
- Mirror descent
- Gradient descent
- The Descent
- Kullback–Leibler divergence
- Information geometry
- Bias–variance tradeoff
- Online machine learning
- Mathematical optimization
- Mirror test
- Descent (video game)
In mathematics, mirror descent is an iterative optimization algorithm for finding a local minimum of a differentiable function.
It generalizes algorithms such as gradient descent and multiplicative weights.
History
Mirror descent was originally proposed by Nemirovski and Yudin in 1983.
Motivation
In gradient descent with the sequence of learning rates
(
η
n
)
n
≥
0
{\displaystyle (\eta _{n})_{n\geq 0}}
applied to a differentiable function
F
{\displaystyle F}
, one starts with a guess
x
0
{\displaystyle \mathbf {x} _{0}}
for a local minimum of
F
,
{\displaystyle F,}
and considers the sequence
x
0
,
x
1
,
x
2
,
…
{\displaystyle \mathbf {x} _{0},\mathbf {x} _{1},\mathbf {x} _{2},\ldots }
such that
x
n
+
1
=
x
n
−
η
n
∇
F
(
x
n
)
,
n
≥
0.
{\displaystyle \mathbf {x} _{n+1}=\mathbf {x} _{n}-\eta _{n}\nabla F(\mathbf {x} _{n}),\ n\geq 0.}
This can be reformulated by noting that
x
n
+
1
=
arg
min
x
(
F
(
x
n
)
+
∇
F
(
x
n
)
T
(
x
−
x
n
)
+
1
2
η
n
‖
x
−
x
n
‖
2
)
{\displaystyle \mathbf {x} _{n+1}=\arg \min _{\mathbf {x} }\left(F(\mathbf {x} _{n})+\nabla F(\mathbf {x} _{n})^{T}(\mathbf {x} -\mathbf {x} _{n})+{\frac {1}{2\eta _{n}}}\|\mathbf {x} -\mathbf {x} _{n}\|^{2}\right)}
In other words,
x
n
+
1
{\displaystyle \mathbf {x} _{n+1}}
minimizes the first-order approximation to
F
{\displaystyle F}
at
x
n
{\displaystyle \mathbf {x} _{n}}
with added proximity term
‖
x
−
x
n
‖
2
{\displaystyle \|\mathbf {x} -\mathbf {x} _{n}\|^{2}}
.
This squared Euclidean distance term is a particular example of a Bregman distance. Using other Bregman distances will yield other algorithms such as Hedge which may be more suited to optimization over particular geometries.
Formulation
We are given convex function
f
{\displaystyle f}
to optimize over a convex set
K
⊂
R
n
{\displaystyle K\subset \mathbb {R} ^{n}}
, and given some norm
‖
⋅
‖
{\displaystyle \|\cdot \|}
on
R
n
{\displaystyle \mathbb {R} ^{n}}
.
We are also given differentiable convex function
h
:
R
n
→
R
{\displaystyle h\colon \mathbb {R} ^{n}\to \mathbb {R} }
,
α
{\displaystyle \alpha }
-strongly convex with respect to the given norm. This is called the distance-generating function, and its gradient
∇
h
:
R
n
→
R
n
{\displaystyle \nabla h\colon \mathbb {R} ^{n}\to \mathbb {R} ^{n}}
is known as the mirror map.
Starting from initial
x
0
∈
K
{\displaystyle x_{0}\in K}
, in each iteration of Mirror Descent:
Map to the dual space:
θ
t
←
∇
h
(
x
t
)
{\displaystyle \theta _{t}\leftarrow \nabla h(x_{t})}
Update in the dual space using a gradient step:
θ
t
+
1
←
θ
t
−
η
t
∇
f
(
x
t
)
{\displaystyle \theta _{t+1}\leftarrow \theta _{t}-\eta _{t}\nabla f(x_{t})}
Map back to the primal space:
x
t
+
1
′
←
(
∇
h
)
−
1
(
θ
t
+
1
)
{\displaystyle x'_{t+1}\leftarrow (\nabla h)^{-1}(\theta _{t+1})}
Project back to the feasible region
K
{\displaystyle K}
:
x
t
+
1
←
a
r
g
min
x
∈
K
D
h
(
x
|
|
x
t
+
1
′
)
{\displaystyle x_{t+1}\leftarrow \mathrm {arg} \min _{x\in K}D_{h}(x||x'_{t+1})}
, where
D
h
{\displaystyle D_{h}}
is the Bregman divergence.
Extensions
Mirror descent in the online optimization setting is known as Online Mirror Descent (OMD).
See also
Gradient descent
Multiplicative weight update method
Hedge algorithm
Bregman divergence