- Source: Matrix sign function
In mathematics, the matrix sign function is a matrix function on square matrices analogous to the complex sign function.
It was introduced by J.D. Roberts in 1971 as a tool for model reduction and for solving Lyapunov and Algebraic Riccati equation in a technical report of Cambridge University, which was later published in a journal in 1980.
Definition
The matrix sign function is a generalization of the complex signum function
csgn
(
z
)
=
{
1
if
R
e
(
z
)
>
0
,
−
1
if
R
e
(
z
)
<
0
,
{\displaystyle \operatorname {csgn} (z)={\begin{cases}1&{\text{if }}\mathrm {Re} (z)>0,\\-1&{\text{if }}\mathrm {Re} (z)<0,\end{cases}}}
to the matrix valued analogue
csgn
(
A
)
{\displaystyle \operatorname {csgn} (A)}
. Although the sign function is not analytic, the matrix function is well defined for all matrices that have no eigenvalue on the imaginary axis, see for example the Jordan-form-based definition (where the derivatives are all zero).
Properties
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
csgn
(
A
)
2
=
I
{\displaystyle \operatorname {csgn} (A)^{2}=I}
.
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
csgn
(
A
)
{\displaystyle \operatorname {csgn} (A)}
is diagonalizable and has eigenvalues that are
±
1
{\displaystyle \pm 1}
.
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, then
(
I
+
csgn
(
A
)
)
/
2
{\displaystyle (I+\operatorname {csgn} (A))/2}
is a projector onto the invariant subspace associated with the eigenvalues in the right-half plane, and analogously for
(
I
−
csgn
(
A
)
)
/
2
{\displaystyle (I-\operatorname {csgn} (A))/2}
and the left-half plane.
Theorem: Let
A
∈
C
n
×
n
{\displaystyle A\in \mathbb {C} ^{n\times n}}
, and
A
=
P
[
J
+
0
0
J
−
]
P
−
1
{\displaystyle A=P{\begin{bmatrix}J_{+}&0\\0&J_{-}\end{bmatrix}}P^{-1}}
be a Jordan decomposition such that
J
+
{\displaystyle J_{+}}
corresponds to eigenvalues with positive real part and
J
−
{\displaystyle J_{-}}
to eigenvalue with negative real part. Then
csgn
(
A
)
=
P
[
I
+
0
0
−
I
−
]
P
−
1
{\displaystyle \operatorname {csgn} (A)=P{\begin{bmatrix}I_{+}&0\\0&-I_{-}\end{bmatrix}}P^{-1}}
, where
I
+
{\displaystyle I_{+}}
and
I
−
{\displaystyle I_{-}}
are identity matrices of sizes corresponding to
J
+
{\displaystyle J_{+}}
and
J
−
{\displaystyle J_{-}}
, respectively.
Computational methods
The function can be computed with generic methods for matrix functions, but there are also specialized methods.
= Newton iteration
=The Newton iteration can be derived by observing that
csgn
(
x
)
=
x
2
/
x
{\displaystyle \operatorname {csgn} (x)={\sqrt {x^{2}}}/x}
, which in terms of matrices can be written as
csgn
(
A
)
=
A
−
1
A
2
{\displaystyle \operatorname {csgn} (A)=A^{-1}{\sqrt {A^{2}}}}
, where we use the matrix square root. If we apply the Babylonian method to compute the square root of the matrix
A
2
{\displaystyle A^{2}}
, that is, the iteration
X
k
+
1
=
1
2
(
X
k
+
A
X
k
−
1
)
{\textstyle X_{k+1}={\frac {1}{2}}\left(X_{k}+AX_{k}^{-1}\right)}
, and define the new iterate
Z
k
=
A
−
1
X
k
{\displaystyle Z_{k}=A^{-1}X_{k}}
, we arrive at the iteration
Z
k
+
1
=
1
2
(
Z
k
+
Z
k
−
1
)
{\displaystyle Z_{k+1}={\frac {1}{2}}\left(Z_{k}+Z_{k}^{-1}\right)}
,
where typically
Z
0
=
A
{\displaystyle Z_{0}=A}
. Convergence is global, and locally it is quadratic.
The Newton iteration uses the explicit inverse of the iterates
Z
k
{\displaystyle Z_{k}}
.
= Newton–Schulz iteration
=To avoid the need of an explicit inverse used in the Newton iteration, the inverse can be approximated with one step of the Newton iteration for the inverse,
Z
k
−
1
≈
Z
k
(
2
I
−
Z
k
2
)
{\displaystyle Z_{k}^{-1}\approx Z_{k}\left(2I-Z_{k}^{2}\right)}
, derived by Schulz(de) in 1933. Substituting this approximation into the previous method, the new method becomes
Z
k
+
1
=
1
2
Z
k
(
3
I
−
Z
k
2
)
{\displaystyle Z_{k+1}={\frac {1}{2}}Z_{k}\left(3I-Z_{k}^{2}\right)}
.
Convergence is (still) quadratic, but only local (guaranteed for
‖
I
−
A
2
‖
<
1
{\displaystyle \|I-A^{2}\|<1}
).
Applications
= Solutions of Sylvester equations
=Theorem: Let
A
,
B
,
C
∈
R
n
×
n
{\displaystyle A,B,C\in \mathbb {R} ^{n\times n}}
and assume that
A
{\displaystyle A}
and
B
{\displaystyle B}
are stable, then the unique solution to the Sylvester equation,
A
X
+
X
B
=
C
{\displaystyle AX+XB=C}
, is given by
X
{\displaystyle X}
such that
[
−
I
2
X
0
I
]
=
csgn
(
[
A
−
C
0
−
B
]
)
.
{\displaystyle {\begin{bmatrix}-I&2X\\0&I\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right).}
Proof sketch: The result follows from the similarity transform
[
A
−
C
0
−
B
]
=
[
I
X
0
I
]
[
A
0
0
−
B
]
[
I
X
0
I
]
−
1
,
{\displaystyle {\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}A&0\\0&-B\end{bmatrix}}{\begin{bmatrix}I&X\\0&I\end{bmatrix}}^{-1},}
since
csgn
(
[
A
−
C
0
−
B
]
)
=
[
I
X
0
I
]
[
I
0
0
−
I
]
[
I
−
X
0
I
]
,
{\displaystyle \operatorname {csgn} \left({\begin{bmatrix}A&-C\\0&-B\end{bmatrix}}\right)={\begin{bmatrix}I&X\\0&I\end{bmatrix}}{\begin{bmatrix}I&0\\0&-I\end{bmatrix}}{\begin{bmatrix}I&-X\\0&I\end{bmatrix}},}
due to the stability of
A
{\displaystyle A}
and
B
{\displaystyle B}
.
The theorem is, naturally, also applicable to the Lyapunov equation. However, due to the structure the Newton iteration simplifies to only involving inverses of
A
{\displaystyle A}
and
A
T
{\displaystyle A^{T}}
.
= Solutions of algebraic Riccati equations
=There is a similar result applicable to the algebraic Riccati equation,
A
H
P
+
P
A
−
P
F
P
+
Q
=
0
{\displaystyle A^{H}P+PA-PFP+Q=0}
. Define
V
,
W
∈
C
2
n
×
n
{\displaystyle V,W\in \mathbb {C} ^{2n\times n}}
as
[
V
W
]
=
csgn
(
[
A
H
Q
F
−
A
]
)
−
[
I
0
0
I
]
.
{\displaystyle {\begin{bmatrix}V&W\end{bmatrix}}=\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}.}
Under the assumption that
F
,
Q
∈
C
n
×
n
{\displaystyle F,Q\in \mathbb {C} ^{n\times n}}
are Hermitian and there exists a unique stabilizing solution, in the sense that
A
−
F
P
{\displaystyle A-FP}
is stable, that solution is given by the over-determined, but consistent, linear system
V
P
=
−
W
.
{\displaystyle VP=-W.}
Proof sketch: The similarity transform
[
A
H
Q
F
−
A
]
=
[
P
−
I
I
0
]
[
(
−
A
−
F
P
)
−
F
0
(
A
−
F
P
)
]
[
P
−
I
I
0
]
−
1
,
{\displaystyle {\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}={\begin{bmatrix}P&-I\\I&0\end{bmatrix}}{\begin{bmatrix}(-A-FP)&-F\\0&(A-FP)\end{bmatrix}}{\begin{bmatrix}P&-I\\I&0\end{bmatrix}}^{-1},}
and the stability of
A
−
F
P
{\displaystyle A-FP}
implies that
(
csgn
(
[
A
H
Q
F
−
A
]
)
−
[
I
0
0
I
]
)
[
X
−
I
I
0
]
=
[
X
−
I
I
0
]
[
0
Y
0
−
2
I
]
,
{\displaystyle \left(\operatorname {csgn} \left({\begin{bmatrix}A^{H}&Q\\F&-A\end{bmatrix}}\right)-{\begin{bmatrix}I&0\\0&I\end{bmatrix}}\right){\begin{bmatrix}X&-I\\I&0\end{bmatrix}}={\begin{bmatrix}X&-I\\I&0\end{bmatrix}}{\begin{bmatrix}0&Y\\0&-2I\end{bmatrix}},}
for some matrix
Y
∈
C
n
×
n
{\displaystyle Y\in \mathbb {C} ^{n\times n}}
.
= Computations of matrix square-root
=The Denman–Beavers iteration for the square root of a matrix can be derived from the Newton iteration for the matrix sign function by noticing that
A
−
P
I
P
=
0
{\displaystyle A-PIP=0}
is a degenerate algebraic Riccati equation and by definition a solution
P
{\displaystyle P}
is the square root of
A
{\displaystyle A}
.
References
Kata Kunci Pencarian:
- Daftar kata serapan dari bahasa Inggris dalam bahasa Indonesia
- Hipotesis Riemann
- Aneurisme aorta
- Matrix sign function
- Sign function
- Analytic function of a matrix
- Hessian matrix
- Jacobian matrix and determinant
- Square root of a matrix
- Walsh matrix
- Model order reduction
- Definite matrix
- Determinant