- Source: LogSumExp
The LogSumExp (LSE) (also called RealSoftMax or multivariable softplus) function is a smooth maximum – a smooth approximation to the maximum function, mainly used by machine learning algorithms. It is defined as the logarithm of the sum of the exponentials of the arguments:
L
S
E
(
x
1
,
…
,
x
n
)
=
log
(
exp
(
x
1
)
+
⋯
+
exp
(
x
n
)
)
.
{\displaystyle \mathrm {LSE} (x_{1},\dots ,x_{n})=\log \left(\exp(x_{1})+\cdots +\exp(x_{n})\right).}
Properties
The LogSumExp function domain is
R
n
{\displaystyle \mathbb {R} ^{n}}
, the real coordinate space, and its codomain is
R
{\displaystyle \mathbb {R} }
, the real line.
It is an approximation to the maximum
max
i
x
i
{\displaystyle \max _{i}x_{i}}
with the following bounds
max
{
x
1
,
…
,
x
n
}
≤
L
S
E
(
x
1
,
…
,
x
n
)
≤
max
{
x
1
,
…
,
x
n
}
+
log
(
n
)
.
{\displaystyle \max {\{x_{1},\dots ,x_{n}\}}\leq \mathrm {LSE} (x_{1},\dots ,x_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+\log(n).}
The first inequality is strict unless
n
=
1
{\displaystyle n=1}
. The second inequality is strict unless all arguments are equal.
(Proof: Let
m
=
max
i
x
i
{\displaystyle m=\max _{i}x_{i}}
. Then
exp
(
m
)
≤
∑
i
=
1
n
exp
(
x
i
)
≤
n
exp
(
m
)
{\displaystyle \exp(m)\leq \sum _{i=1}^{n}\exp(x_{i})\leq n\exp(m)}
. Applying the logarithm to the inequality gives the result.)
In addition, we can scale the function to make the bounds tighter. Consider the function
1
t
L
S
E
(
t
x
1
,
…
,
t
x
n
)
{\displaystyle {\frac {1}{t}}\mathrm {LSE} (tx_{1},\dots ,tx_{n})}
. Then
max
{
x
1
,
…
,
x
n
}
<
1
t
L
S
E
(
t
x
1
,
…
,
t
x
n
)
≤
max
{
x
1
,
…
,
x
n
}
+
log
(
n
)
t
.
{\displaystyle \max {\{x_{1},\dots ,x_{n}\}}<{\frac {1}{t}}\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{x_{1},\dots ,x_{n}\}}+{\frac {\log(n)}{t}}.}
(Proof: Replace each
x
i
{\displaystyle x_{i}}
with
t
x
i
{\displaystyle tx_{i}}
for some
t
>
0
{\displaystyle t>0}
in the inequalities above, to give
max
{
t
x
1
,
…
,
t
x
n
}
<
L
S
E
(
t
x
1
,
…
,
t
x
n
)
≤
max
{
t
x
1
,
…
,
t
x
n
}
+
log
(
n
)
.
{\displaystyle \max {\{tx_{1},\dots ,tx_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq \max {\{tx_{1},\dots ,tx_{n}\}}+\log(n).}
and, since
t
>
0
{\displaystyle t>0}
t
max
{
x
1
,
…
,
x
n
}
<
L
S
E
(
t
x
1
,
…
,
t
x
n
)
≤
t
max
{
x
1
,
…
,
x
n
}
+
log
(
n
)
.
{\displaystyle t\max {\{x_{1},\dots ,x_{n}\}}<\mathrm {LSE} (tx_{1},\dots ,tx_{n})\leq t\max {\{x_{1},\dots ,x_{n}\}}+\log(n).}
finally, dividing by
t
{\displaystyle t}
gives the result.)
Also, if we multiply by a negative number instead, we of course find a comparison to the
min
{\displaystyle \min }
function:
min
{
x
1
,
…
,
x
n
}
−
log
(
n
)
t
≤
1
−
t
L
S
E
(
−
t
x
)
<
min
{
x
1
,
…
,
x
n
}
.
{\displaystyle \min {\{x_{1},\dots ,x_{n}\}}-{\frac {\log(n)}{t}}\leq {\frac {1}{-t}}\mathrm {LSE} (-tx)<\min {\{x_{1},\dots ,x_{n}\}}.}
The LogSumExp function is convex, and is strictly increasing everywhere in its domain. It is not strictly convex, since it is affine (linear plus a constant) on the diagonal and parallel lines:
L
S
E
(
x
1
+
c
,
…
,
x
n
+
c
)
=
L
S
E
(
x
1
,
…
,
x
n
)
+
c
.
{\displaystyle \mathrm {LSE} (x_{1}+c,\dots ,x_{n}+c)=\mathrm {LSE} (x_{1},\dots ,x_{n})+c.}
Other than this direction, it is strictly convex (the Hessian has rank
n
−
1
{\displaystyle n-1}
), so for example restricting to a hyperplane that is transverse to the diagonal results in a strictly convex function. See
L
S
E
0
+
{\displaystyle \mathrm {LSE} _{0}^{+}}
, below.
Writing
x
=
(
x
1
,
…
,
x
n
)
,
{\displaystyle \mathbf {x} =(x_{1},\dots ,x_{n}),}
the partial derivatives are:
∂
∂
x
i
L
S
E
(
x
)
=
exp
x
i
∑
j
exp
x
j
,
{\displaystyle {\frac {\partial }{\partial x_{i}}}{\mathrm {LSE} (\mathbf {x} )}={\frac {\exp x_{i}}{\sum _{j}\exp {x_{j}}}},}
which means the gradient of LogSumExp is the softmax function.
The convex conjugate of LogSumExp is the negative entropy.
log-sum-exp trick for log-domain calculations
The LSE function is often encountered when the usual arithmetic computations are performed on a logarithmic scale, as in log probability.
Similar to multiplication operations in linear-scale becoming simple additions in log-scale, an addition operation in
linear-scale becomes the LSE in log-scale:
L
S
E
(
log
(
x
1
)
,
.
.
.
,
log
(
x
n
)
)
=
log
(
x
1
+
⋯
+
x
n
)
{\displaystyle \mathrm {LSE} (\log(x_{1}),...,\log(x_{n}))=\log(x_{1}+\dots +x_{n})}
A common purpose of using log-domain computations is to increase accuracy and avoid underflow and overflow problems
when very small or very large numbers are represented directly (i.e. in a linear domain) using limited-precision
floating point numbers.
Unfortunately, the use of LSE directly in this case can again cause overflow/underflow problems. Therefore, the
following equivalent must be used instead (especially when the accuracy of the above 'max' approximation is not sufficient).
L
S
E
(
x
1
,
…
,
x
n
)
=
x
∗
+
log
(
exp
(
x
1
−
x
∗
)
+
⋯
+
exp
(
x
n
−
x
∗
)
)
{\displaystyle \mathrm {LSE} (x_{1},\dots ,x_{n})=x^{*}+\log \left(\exp(x_{1}-x^{*})+\cdots +\exp(x_{n}-x^{*})\right)}
where
x
∗
=
max
{
x
1
,
…
,
x
n
}
{\displaystyle x^{*}=\max {\{x_{1},\dots ,x_{n}\}}}
Many math libraries such as IT++ provide a default routine of LSE and use this formula internally.
A strictly convex log-sum-exp type function
LSE is convex but not strictly convex.
We can define a strictly convex log-sum-exp type function by adding an extra argument set to zero:
L
S
E
0
+
(
x
1
,
.
.
.
,
x
n
)
=
L
S
E
(
0
,
x
1
,
.
.
.
,
x
n
)
{\displaystyle \mathrm {LSE} _{0}^{+}(x_{1},...,x_{n})=\mathrm {LSE} (0,x_{1},...,x_{n})}
This function is a proper Bregman generator (strictly convex and differentiable).
It is encountered in machine learning, for example, as the cumulant of the multinomial/binomial family.
In tropical analysis, this is the sum in the log semiring.
See also
Logarithmic mean
Log semiring
Smooth maximum
Softmax function
References
Kata Kunci Pencarian:
- Indeks artikel logaritma
- Fungsi cembung
- LogSumExp
- Softplus
- Index of logarithm articles
- Rectifier (neural networks)
- Smooth maximum
- Quasi-arithmetic mean
- Log semiring
- Softmax function
- Negentropy
- Convex function