- Pemelajaran terarah
- Galat generalisasi
- Daftar kata serapan dari bahasa Inggris dalam bahasa Indonesia
- Stabilitas (teori pemelajaran)
- Kurva belajar (pemelajaran mesin)
- Teorema dasar kalkulus
- Black Swan
- Generalization error
- Faulty generalization
- Supervised learning
- Ensemble learning
- Margin classifier
- List of fallacies
- Early stopping
- Bias–variance tradeoff
- Support vector machine
- Regularization (mathematics)
- machine learning - How can the generalization error be estimated ...
- Generalization Error Definition - Data Science Stack Exchange
- neural networks - Understanding Generalization Error in …
- What's the difference between estimation and approximation error?
- machine learning - Generalization error for simple linear …
- machine learning - Generalization error problem on training set
- Do Random Forest overfit? - Data Science Stack Exchange
- classification - Nested cross-validation generalization error for ...
- Why does generalization error's integral integrate over both X and …
- machine learning - Relationship between train and test error
Generalization error GudangMovies21 Rebahinxxi LK21
For supervised learning applications in machine learning and statistical learning theory, generalization error (also known as the out-of-sample error or the risk) is a measure of how accurately an algorithm is able to predict outcomes for previously unseen data. As learning algorithms are evaluated on finite samples, the evaluation of a learning algorithm may be sensitive to sampling error. As a result, measurements of prediction error on the current data may not provide much information about the algorithm's predictive ability on new, unseen data. The generalization error can be minimized by avoiding overfitting in the learning algorithm. The performance of machine learning algorithms is commonly visualized by learning curve plots that show estimates of the generalization error throughout the learning process.
Definition
In a learning problem, the goal is to develop a function
f
n
(
x
→
)
{\displaystyle f_{n}({\vec {x}})}
that predicts output values
y
{\displaystyle y}
for each input datum
x
→
{\displaystyle {\vec {x}}}
. The subscript
n
{\displaystyle n}
indicates that the function
f
n
{\displaystyle f_{n}}
is developed based on a data set of
n
{\displaystyle n}
data points. The generalization error or expected loss or risk
I
[
f
]
{\displaystyle I[f]}
of a particular function
f
{\displaystyle f}
over all possible values of
x
→
{\displaystyle {\vec {x}}}
and
y
{\displaystyle y}
is the expected value of the loss function
V
(
f
)
{\displaystyle V(f)}
:
I
[
f
]
=
∫
X
×
Y
V
(
f
(
x
→
)
,
y
)
ρ
(
x
→
,
y
)
d
x
→
d
y
,
{\displaystyle I[f]=\int _{X\times Y}V(f({\vec {x}}),y)\rho ({\vec {x}},y)d{\vec {x}}dy,}
where
ρ
(
x
→
,
y
)
{\displaystyle \rho ({\vec {x}},y)}
is the unknown joint probability distribution for
x
→
{\displaystyle {\vec {x}}}
and
y
{\displaystyle y}
.
Without knowing the joint probability distribution
ρ
{\displaystyle \rho }
, it is impossible to compute
I
[
f
]
{\displaystyle I[f]}
. Instead, we can compute the error on sample data, which is called empirical error (or empirical risk). Given
n
{\displaystyle n}
data points, the empirical error of a candidate function
f
{\displaystyle f}
is:
I
n
[
f
]
=
1
n
∑
i
=
1
n
V
(
f
(
x
→
i
)
,
y
i
)
{\displaystyle I_{n}[f]={\frac {1}{n}}\sum _{i=1}^{n}V(f({\vec {x}}_{i}),y_{i})}
An algorithm is said to generalize if:
lim
n
→
∞
I
[
f
]
−
I
n
[
f
]
=
0
{\displaystyle \lim _{n\rightarrow \infty }I[f]-I_{n}[f]=0}
Of particular importance is the generalization error
I
[
f
n
]
{\displaystyle I[f_{n}]}
of the data-dependent function
f
n
{\displaystyle f_{n}}
that is found by a learning algorithm based on the sample. Again, for an unknown probability distribution,
I
[
f
n
]
{\displaystyle I[f_{n}]}
cannot be computed. Instead, the aim of many problems in statistical learning theory is to bound or characterize the difference of the generalization error and the empirical error in probability:
P
G
=
P
(
I
[
f
n
]
−
I
n
[
f
n
]
≤
ϵ
)
≥
1
−
δ
n
{\displaystyle P_{G}=P(I[f_{n}]-I_{n}[f_{n}]\leq \epsilon )\geq 1-\delta _{n}}
That is, the goal is to characterize the probability
1
−
δ
n
{\displaystyle 1-\delta _{n}}
that the generalization error is less than the empirical error plus some error bound
ϵ
{\displaystyle \epsilon }
(generally dependent on
δ
{\displaystyle \delta }
and
n
{\displaystyle n}
).
For many types of algorithms, it has been shown that an algorithm has generalization bounds if it meets certain stability criteria. Specifically, if an algorithm is symmetric (the order of inputs does not affect the result), has bounded loss and meets two stability conditions, it will generalize. The first stability condition, leave-one-out cross-validation stability, says that to be stable, the prediction error for each data point when leave-one-out cross validation is used must converge to zero as
n
→
∞
{\displaystyle n\rightarrow \infty }
. The second condition, expected-to-leave-one-out error stability (also known as hypothesis stability if operating in the
L
1
{\displaystyle L_{1}}
norm) is met if the prediction on a left-out datapoint does not change when a single data point is removed from the training dataset.
These conditions can be formalized as:
= Leave-one-out cross-validation Stability
=An algorithm
L
{\displaystyle L}
has
C
V
l
o
o
{\displaystyle CVloo}
stability if for each
n
{\displaystyle n}
, there exists a
β
C
V
(
n
)
{\displaystyle \beta _{CV}^{(n)}}
and
δ
C
V
(
n
)
{\displaystyle \delta _{CV}^{(n)}}
such that:
∀
i
∈
{
1
,
.
.
.
,
n
}
,
P
S
{
|
V
(
f
S
i
,
z
i
)
−
V
(
f
S
,
z
i
)
|
≤
β
C
V
(
n
)
}
≥
1
−
δ
C
V
(
n
)
{\displaystyle \forall i\in \{1,...,n\},\mathbb {P} _{S}\{|V(f_{S^{i}},z_{i})-V(f_{S},z_{i})|\leq \beta _{CV}^{(n)}\}\geq 1-\delta _{CV}^{(n)}}
and
β
C
V
(
n
)
{\displaystyle \beta _{CV}^{(n)}}
and
δ
C
V
(
n
)
{\displaystyle \delta _{CV}^{(n)}}
go to zero as
n
{\displaystyle n}
goes to infinity.
= Expected-leave-one-out error Stability
=An algorithm
L
{\displaystyle L}
has
E
l
o
o
e
r
r
{\displaystyle Eloo_{err}}
stability if for each
n
{\displaystyle n}
there exists a
β
E
L
m
{\displaystyle \beta _{EL}^{m}}
and a
δ
E
L
m
{\displaystyle \delta _{EL}^{m}}
such that:
∀
i
∈
{
1
,
.
.
.
,
n
}
,
P
S
{
|
I
[
f
S
]
−
1
n
∑
i
=
1
N
V
(
f
S
i
,
z
i
)
|
≤
β
E
L
(
n
)
}
≥
1
−
δ
E
L
(
n
)
{\displaystyle \forall i\in \{1,...,n\},\mathbb {P} _{S}\left\{\left|I[f_{S}]-{\frac {1}{n}}\sum _{i=1}^{N}V\left(f_{S^{i}},z_{i}\right)\right|\leq \beta _{EL}^{(n)}\right\}\geq 1-\delta _{EL}^{(n)}}
with
β
E
L
(
n
)
{\displaystyle \beta _{EL}^{(n)}}
and
δ
E
L
(
n
)
{\displaystyle \delta _{EL}^{(n)}}
going to zero for
n
→
∞
{\displaystyle n\rightarrow \infty }
.
For leave-one-out stability in the
L
1
{\displaystyle L_{1}}
norm, this is the same as hypothesis stability:
E
S
,
z
[
|
V
(
f
S
,
z
)
−
V
(
f
S
i
,
z
)
|
]
≤
β
H
(
n
)
{\displaystyle \mathbb {E} _{S,z}[|V(f_{S},z)-V(f_{S^{i}},z)|]\leq \beta _{H}^{(n)}}
with
β
H
(
n
)
{\displaystyle \beta _{H}^{(n)}}
going to zero as
n
{\displaystyle n}
goes to infinity.
= Algorithms with proven stability
=A number of algorithms have been proven to be stable and as a result have bounds on their generalization error. A list of these algorithms and the papers that proved stability is available here.
Relation to overfitting
The concepts of generalization error and overfitting are closely related. Overfitting occurs when the learned function
f
S
{\displaystyle f_{S}}
becomes sensitive to the noise in the sample. As a result, the function will perform well on the training set but not perform well on other data from the joint probability distribution of
x
{\displaystyle x}
and
y
{\displaystyle y}
. Thus, the more overfitting occurs, the larger the generalization error.
The amount of overfitting can be tested using cross-validation methods, that split the sample into simulated training samples and testing samples. The model is then trained on a training sample and evaluated on the testing sample. The testing sample is previously unseen by the algorithm and so represents a random sample from the joint probability distribution of
x
{\displaystyle x}
and
y
{\displaystyle y}
. This test sample allows us to approximate the expected error and as a result approximate a particular form of the generalization error.
Many algorithms exist to prevent overfitting. The minimization algorithm can penalize more complex functions (known as Tikhonov regularization), or the hypothesis space can be constrained, either explicitly in the form of the functions or by adding constraints to the minimization function (Ivanov regularization).
The approach to finding a function that does not overfit is at odds with the goal of finding a function that is sufficiently complex to capture the particular characteristics of the data. This is known as the bias–variance tradeoff. Keeping a function simple to avoid overfitting may introduce a bias in the resulting predictions, while allowing it to be more complex leads to overfitting and a higher variance in the predictions. It is impossible to minimize both simultaneously.
References
Further reading
Olivier, Bousquet; Luxburg, Ulrike; Rätsch, Gunnar, eds. (2004). Advanced Lectures on Machine Learning. Lecture Notes in Computer Science. Vol. 3176. pp. 169–207. doi:10.1007/b100712. ISBN 978-3-540-23122-6. S2CID 431437. Retrieved 10 December 2022.
Bousquet, Olivier; Elisseeff, Andr´e (1 March 2002). "Stability and Generalization". The Journal of Machine Learning Research. 2: 499–526. doi:10.1162/153244302760200704. S2CID 1157797. Retrieved 10 December 2022.
Mohri, M., Rostamizadeh A., Talwakar A., (2018) Foundations of Machine learning, 2nd ed., Boston: MIT Press.
Moody, J.E. (1992), "The Effective Number of Parameters: An Analysis of Generalization and Regularization in Nonlinear Learning Systems Archived 2016-09-10 at the Wayback Machine", in Moody, J.E., Hanson, S.J., and Lippmann, R.P., Advances in Neural Information Processing Systems 4, 847–854.
White, H. (1992b), Artificial Neural Networks: Approximation and Learning Theory, Blackwell.
Kata Kunci Pencarian:
Generalization error during training. | Download Scientific Diagram

Generalization error during training. | Download Scientific Diagram

Generalization error for N = 120 | Download Scientific Diagram

Generalization error for N = 120 | Download Scientific Diagram
Generalization error estimates relative to the generalization errors ...

Generalization error estimates relative to the generalization errors ...
Generalization error estimates relative to the average of the ...

Comparison between the generalization error bound and generalization ...

Decrease of generalization error relative to the average individual ...

(A) The generalization error compared to empirical error and true error ...

Generalization error vs. regularization parameter | Download Scientific ...

Generalization error vs. regularization parameter | Download Scientific ...
generalization error
Daftar Isi
machine learning - How can the generalization error be estimated ...
Nov 7, 2020 · So, if you want to measure generalization error, you need to remove a subset from your data and don't train your model on it. After training, you verify your model accuracy (or other performance measures) on the subset you have removed since your model hasn't seen it before.
Generalization Error Definition - Data Science Stack Exchange
Stack Exchange Network. Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
neural networks - Understanding Generalization Error in …
Jan 14, 2025 · Stack Exchange Network. Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
What's the difference between estimation and approximation error?
Oct 2, 2020 · Tour Start here for a quick overview of the site
machine learning - Generalization error for simple linear …
Jan 25, 2017 · Lets say we have a training data and we have estimated a fit for a model of square ft of living area vs price of houses. Suppose we know the probability distribution of sq ft and for a fixed sq ft we
machine learning - Generalization error problem on training set
May 20, 2021 · Training data: $\\mathcal {T} =\\{(2,1),(3,2),(4,6),(0,0),(1,1)\\}$ you already computed a predictor for the output using linear regression by least squares, where ...
Do Random Forest overfit? - Data Science Stack Exchange
Aug 23, 2014 · Stack Exchange Network. Stack Exchange network consists of 183 Q&A communities including Stack Overflow, the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.
classification - Nested cross-validation generalization error for ...
should I calculate generalization errors, using nested cross-validation, separately for lasso and RandomForest algorithms, pretending that I have decided to use one method but wanted to check how the other method would have performed in comparison.
Why does generalization error's integral integrate over both X and …
Jul 17, 2024 · The formula for Generalization error, taken from Wikipedia: $$ I[f]=\int _{X\times Y}V(f({\vec {x}}),y)\rho ({\vec {x}},y)d{\vec {x}}dy $$ d2l.ai's version: $$ R[p, f ...
machine learning - Relationship between train and test error
1) In the link you provide, it says. You can also try increasing the L2 regularization using the 'L2Regularization' name-value pair argument, using batch normalization layers after convolutional layers, and adding dropout layers.