- Source: Winsorized mean
A winsorized mean is a winsorized statistical measure of central tendency, much like the mean and median, and even more similar to the truncated mean. It involves the calculation of the mean after winsorizing — replacing given parts of a probability distribution or sample at the high and low end with the most extreme remaining values, typically doing so for an equal amount of both extremes; often 10 to 25 percent of the ends are replaced. The winsorized mean can equivalently be expressed as a weighted average of the truncated mean and the quantiles at which it is limited, which corresponds to replacing parts with the corresponding quantiles.
Advantages
The winsorized mean is a useful estimator because by retaining the outliers without taking them too literally, it is less sensitive to observations at the extremes than the straightforward mean, and will still generate a reasonable estimate of central tendency or mean for almost all statistical models. In this regard it is referred to as a robust estimator.
Drawbacks
The winsorized mean uses more information from the distribution or sample than the median. However, unless the underlying distribution is symmetric, the winsorized mean of a sample is unlikely to produce an unbiased estimator for either the mean or the median.
Example
For a sample of 10 numbers (from x(1), the smallest, to x(10) the largest; order statistic notation) the 10% winsorized mean is
x
(
2
)
+
x
(
2
)
⏞
+
x
(
3
)
+
x
(
4
)
+
x
(
5
)
+
x
(
6
)
+
x
(
7
)
+
x
(
8
)
+
x
(
9
)
+
x
(
9
)
⏞
10
.
{\displaystyle {\frac {\overbrace {x_{(2)}+x_{(2)}} +x_{(3)}+x_{(4)}+x_{(5)}+x_{(6)}+x_{(7)}+x_{(8)}+\overbrace {x_{(9)}+x_{(9)}} }{10}}.\,}
The key is in the repetition of x(2) and x(9): the extras substitute for the original values x(1) and x(10) which have been discarded and replaced.
This is equivalent to a weighted average of 0.1 times the 5th percentile (x(2)), 0.8 times the 10% trimmed mean, and 0.1 times the 95th percentile (x(9)).
Notes
References
Wilcox, R.R.; Keselman, H.J. (2003). "Modern robust data analysis methods: Measures of central tendency". Psychological Methods. 8 (3): 254–274. doi:10.1037/1082-989X.8.3.254. PMID 14596490.
Kata Kunci Pencarian:
- Winsorized mean
- Winsorizing
- Sample mean and covariance
- Truncated mean
- Central tendency
- Average
- Glossary of engineering: M–Z
- L-estimator
- Winsor
- Glossary of civil engineering