- Source: Residual sum of squares
In statistics, the residual sum of squares (RSS), also known as the sum of squared residuals (SSR) or the sum of squared estimate of errors (SSE), is the sum of the squares of residuals (deviations predicted from actual empirical values of data). It is a measure of the discrepancy between the data and an estimation model, such as a linear regression. A small RSS indicates a tight fit of the model to the data. It is used as an optimality criterion in parameter selection and model selection.
In general, total sum of squares = explained sum of squares + residual sum of squares. For a proof of this in the multivariate ordinary least squares (OLS) case, see partitioning in the general OLS model.
One explanatory variable
In a model with a single explanatory variable, RSS is given by:
RSS
=
∑
i
=
1
n
(
y
i
−
f
(
x
i
)
)
2
{\displaystyle \operatorname {RSS} =\sum _{i=1}^{n}(y_{i}-f(x_{i}))^{2}}
where yi is the ith value of the variable to be predicted, xi is the ith value of the explanatory variable, and
f
(
x
i
)
{\displaystyle f(x_{i})}
is the predicted value of yi (also termed
y
i
^
{\displaystyle {\hat {y_{i}}}}
).
In a standard linear simple regression model,
y
i
=
α
+
β
x
i
+
ε
i
{\displaystyle y_{i}=\alpha +\beta x_{i}+\varepsilon _{i}\,}
, where
α
{\displaystyle \alpha }
and
β
{\displaystyle \beta }
are coefficients, y and x are the regressand and the regressor, respectively, and ε is the error term. The sum of squares of residuals is the sum of squares of
ε
^
i
{\displaystyle {\widehat {\varepsilon \,}}_{i}}
; that is
RSS
=
∑
i
=
1
n
(
ε
^
i
)
2
=
∑
i
=
1
n
(
y
i
−
(
α
^
+
β
^
x
i
)
)
2
{\displaystyle \operatorname {RSS} =\sum _{i=1}^{n}({\widehat {\varepsilon \,}}_{i})^{2}=\sum _{i=1}^{n}(y_{i}-({\widehat {\alpha \,}}+{\widehat {\beta \,}}x_{i}))^{2}}
where
α
^
{\displaystyle {\widehat {\alpha \,}}}
is the estimated value of the constant term
α
{\displaystyle \alpha }
and
β
^
{\displaystyle {\widehat {\beta \,}}}
is the estimated value of the slope coefficient
β
{\displaystyle \beta }
.
Matrix expression for the OLS residual sum of squares
The general regression model with n observations and k explanators, the first of which is a constant unit vector whose coefficient is the regression intercept, is
y
=
X
β
+
e
{\displaystyle y=X\beta +e}
where y is an n × 1 vector of dependent variable observations, each column of the n × k matrix X is a vector of observations on one of the k explanators,
β
{\displaystyle \beta }
is a k × 1 vector of true coefficients, and e is an n× 1 vector of the true underlying errors. The ordinary least squares estimator for
β
{\displaystyle \beta }
is
X
β
^
=
y
⟺
{\displaystyle X{\hat {\beta }}=y\iff }
X
T
X
β
^
=
X
T
y
⟺
{\displaystyle X^{\operatorname {T} }X{\hat {\beta }}=X^{\operatorname {T} }y\iff }
β
^
=
(
X
T
X
)
−
1
X
T
y
.
{\displaystyle {\hat {\beta }}=(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y.}
The residual vector
e
^
=
y
−
X
β
^
=
y
−
X
(
X
T
X
)
−
1
X
T
y
{\displaystyle {\hat {e}}=y-X{\hat {\beta }}=y-X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y}
; so the residual sum of squares is:
RSS
=
e
^
T
e
^
=
‖
e
^
‖
2
{\displaystyle \operatorname {RSS} ={\hat {e}}^{\operatorname {T} }{\hat {e}}=\|{\hat {e}}\|^{2}}
,
(equivalent to the square of the norm of residuals). In full:
RSS
=
y
T
y
−
y
T
X
(
X
T
X
)
−
1
X
T
y
=
y
T
[
I
−
X
(
X
T
X
)
−
1
X
T
]
y
=
y
T
[
I
−
H
]
y
{\displaystyle \operatorname {RSS} =y^{\operatorname {T} }y-y^{\operatorname {T} }X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }y=y^{\operatorname {T} }[I-X(X^{\operatorname {T} }X)^{-1}X^{\operatorname {T} }]y=y^{\operatorname {T} }[I-H]y}
,
where H is the hat matrix, or the projection matrix in linear regression.
Relation with Pearson's product-moment correlation
The least-squares regression line is given by
y
=
a
x
+
b
{\displaystyle y=ax+b}
,
where
b
=
y
¯
−
a
x
¯
{\displaystyle b={\bar {y}}-a{\bar {x}}}
and
a
=
S
x
y
S
x
x
{\displaystyle a={\frac {S_{xy}}{S_{xx}}}}
, where
S
x
y
=
∑
i
=
1
n
(
x
¯
−
x
i
)
(
y
¯
−
y
i
)
{\displaystyle S_{xy}=\sum _{i=1}^{n}({\bar {x}}-x_{i})({\bar {y}}-y_{i})}
and
S
x
x
=
∑
i
=
1
n
(
x
¯
−
x
i
)
2
.
{\displaystyle S_{xx}=\sum _{i=1}^{n}({\bar {x}}-x_{i})^{2}.}
Therefore,
RSS
=
∑
i
=
1
n
(
y
i
−
f
(
x
i
)
)
2
=
∑
i
=
1
n
(
y
i
−
(
a
x
i
+
b
)
)
2
=
∑
i
=
1
n
(
y
i
−
a
x
i
−
y
¯
+
a
x
¯
)
2
=
∑
i
=
1
n
(
a
(
x
¯
−
x
i
)
−
(
y
¯
−
y
i
)
)
2
=
a
2
S
x
x
−
2
a
S
x
y
+
S
y
y
=
S
y
y
−
a
S
x
y
=
S
y
y
(
1
−
S
x
y
2
S
x
x
S
y
y
)
{\displaystyle {\begin{aligned}\operatorname {RSS} &=\sum _{i=1}^{n}(y_{i}-f(x_{i}))^{2}=\sum _{i=1}^{n}(y_{i}-(ax_{i}+b))^{2}=\sum _{i=1}^{n}(y_{i}-ax_{i}-{\bar {y}}+a{\bar {x}})^{2}\\[5pt]&=\sum _{i=1}^{n}(a({\bar {x}}-x_{i})-({\bar {y}}-y_{i}))^{2}=a^{2}S_{xx}-2aS_{xy}+S_{yy}=S_{yy}-aS_{xy}=S_{yy}\left(1-{\frac {S_{xy}^{2}}{S_{xx}S_{yy}}}\right)\end{aligned}}}
where
S
y
y
=
∑
i
=
1
n
(
y
¯
−
y
i
)
2
.
{\displaystyle S_{yy}=\sum _{i=1}^{n}({\bar {y}}-y_{i})^{2}.}
The Pearson product-moment correlation is given by
r
=
S
x
y
S
x
x
S
y
y
;
{\displaystyle r={\frac {S_{xy}}{\sqrt {S_{xx}S_{yy}}}};}
therefore,
RSS
=
S
y
y
(
1
−
r
2
)
.
{\displaystyle \operatorname {RSS} =S_{yy}(1-r^{2}).}
See also
Akaike information criterion#Comparison with least squares
Chi-squared distribution#Applications
Degrees of freedom (statistics)#Sum of squares and degrees of freedom
Errors and residuals in statistics
Lack-of-fit sum of squares
Mean squared error
Reduced chi-squared statistic, RSS per degree of freedom
Squared deviations
Sum of squares (statistics)
References
Draper, N.R.; Smith, H. (1998). Applied Regression Analysis (3rd ed.). John Wiley. ISBN 0-471-17082-8.
Kata Kunci Pencarian:
- Residual sum of squares
- Explained sum of squares
- Total sum of squares
- Sum of squares
- Squared deviations from the mean
- Least squares
- Lack-of-fit sum of squares
- Degrees of freedom (statistics)
- Partition of sums of squares
- Ordinary least squares
Hotel Transylvania 3: Summer Vacation (2018)
The Tunnel to Summer, the Exit of Goodbyes (2022)
The Count of Monte-Cristo (2024)
Cheerleader Camp (1988)
Maze Runner: The Scorch Trials (2015)
Falcon Lake (2022)
Diary of a Wimpy Kid: Dog Days (2012)
No More Posts Available.
No more pages to load.