- Source: Tobit model
In statistics, a tobit model is any of a class of regression models in which the observed range of the dependent variable is censored in some way. The term was coined by Arthur Goldberger in reference to James Tobin, who developed the model in 1958 to mitigate the problem of zero-inflated data for observations of household expenditure on durable goods. Because Tobin's method can be easily extended to handle truncated and other non-randomly selected samples, some authors adopt a broader definition of the tobit model that includes these cases.
Tobin's idea was to modify the likelihood function so that it reflects the unequal sampling probability for each observation depending on whether the latent dependent variable fell above or below the determined threshold. For a sample that, as in Tobin's original case, was censored from below at zero, the sampling probability for each non-limit observation is simply the height of the appropriate density function. For any limit observation, it is the cumulative distribution, i.e. the integral below zero of the appropriate density function. The tobit likelihood function is thus a mixture of densities and cumulative distribution functions.
The likelihood function
Below are the likelihood and log likelihood functions for a type I tobit. This is a tobit that is censored from below at
y
L
{\displaystyle y_{L}}
when the latent variable
y
j
∗
≤
y
L
{\displaystyle y_{j}^{*}\leq y_{L}}
. In writing out the likelihood function, we first define an indicator function
I
{\displaystyle I}
:
I
(
y
)
=
{
0
if
y
≤
y
L
,
1
if
y
>
y
L
.
{\displaystyle I(y)={\begin{cases}0&{\text{if }}y\leq y_{L},\\1&{\text{if }}y>y_{L}.\end{cases}}}
Next, let
Φ
{\displaystyle \Phi }
be the standard normal cumulative distribution function and
φ
{\displaystyle \varphi }
to be the standard normal probability density function. For a data set with N observations the likelihood function for a type I tobit is
L
(
β
,
σ
)
=
∏
j
=
1
N
(
1
σ
φ
(
y
j
−
X
j
β
σ
)
)
I
(
y
j
)
(
1
−
Φ
(
X
j
β
−
y
L
σ
)
)
1
−
I
(
y
j
)
{\displaystyle {\mathcal {L}}(\beta ,\sigma )=\prod _{j=1}^{N}\left({\frac {1}{\sigma }}\varphi \left({\frac {y_{j}-X_{j}\beta }{\sigma }}\right)\right)^{I(y_{j})}\left(1-\Phi \left({\frac {X_{j}\beta -y_{L}}{\sigma }}\right)\right)^{1-I(y_{j})}}
and the log likelihood is given by
log
L
(
β
,
σ
)
=
∑
j
=
1
n
I
(
y
j
)
log
(
1
σ
φ
(
y
j
−
X
j
β
σ
)
)
+
(
1
−
I
(
y
j
)
)
log
(
1
−
Φ
(
X
j
β
−
y
L
σ
)
)
=
∑
y
j
>
y
L
log
(
1
σ
φ
(
y
j
−
X
j
β
σ
)
)
+
∑
y
j
=
y
L
log
(
Φ
(
y
L
−
X
j
β
σ
)
)
{\displaystyle {\begin{aligned}\log {\mathcal {L}}(\beta ,\sigma )&=\sum _{j=1}^{n}I(y_{j})\log \left({\frac {1}{\sigma }}\varphi \left({\frac {y_{j}-X_{j}\beta }{\sigma }}\right)\right)+(1-I(y_{j}))\log \left(1-\Phi \left({\frac {X_{j}\beta -y_{L}}{\sigma }}\right)\right)\\&=\sum _{y_{j}>y_{L}}\log \left({\frac {1}{\sigma }}\varphi \left({\frac {y_{j}-X_{j}\beta }{\sigma }}\right)\right)+\sum _{y_{j}=y_{L}}\log \left(\Phi \left({\frac {y_{L}-X_{j}\beta }{\sigma }}\right)\right)\end{aligned}}}
= Reparametrization
=The log-likelihood as stated above is not globally concave, which complicates the maximum likelihood estimation. Olsen suggested the simple reparametrization
β
=
δ
/
γ
{\displaystyle \beta =\delta /\gamma }
and
σ
2
=
γ
−
2
{\displaystyle \sigma ^{2}=\gamma ^{-2}}
, resulting in a transformed log-likelihood,
log
L
(
δ
,
γ
)
=
∑
y
j
>
y
L
{
log
γ
+
log
[
φ
(
γ
y
j
−
X
j
δ
)
]
}
+
∑
y
j
=
y
L
log
[
Φ
(
γ
y
L
−
X
j
δ
)
]
{\displaystyle \log {\mathcal {L}}(\delta ,\gamma )=\sum _{y_{j}>y_{L}}\left\{\log \gamma +\log \left[\varphi \left(\gamma y_{j}-X_{j}\delta \right)\right]\right\}+\sum _{y_{j}=y_{L}}\log \left[\Phi \left(\gamma y_{L}-X_{j}\delta \right)\right]}
which is globally concave in terms of the transformed parameters.
For the truncated (tobit II) model, Orme showed that while the log-likelihood is not globally concave, it is concave at any stationary point under the above transformation.
= Consistency
=If the relationship parameter
β
{\displaystyle \beta }
is estimated by regressing the observed
y
i
{\displaystyle y_{i}}
on
x
i
{\displaystyle x_{i}}
, the resulting ordinary least squares regression estimator is inconsistent. It will yield a downwards-biased estimate of the slope coefficient and an upward-biased estimate of the intercept. Takeshi Amemiya (1973) has proven that the maximum likelihood estimator suggested by Tobin for this model is consistent.
= Interpretation
=The
β
{\displaystyle \beta }
coefficient should not be interpreted as the effect of
x
i
{\displaystyle x_{i}}
on
y
i
{\displaystyle y_{i}}
, as one would with a linear regression model; this is a common error. Instead, it should be interpreted as the combination of
(1) the change in
y
i
{\displaystyle y_{i}}
of those above the limit, weighted by the probability of being above the limit; and
(2) the change in the probability of being above the limit, weighted by the expected value of
y
i
{\displaystyle y_{i}}
if above.
Variations of the tobit model
Variations of the tobit model can be produced by changing where and when censoring occurs. Amemiya (1985, p. 384) classifies these variations into five categories (tobit type I – tobit type V), where tobit type I stands for the first model described above. Schnedler (2005) provides a general formula to obtain consistent likelihood estimators for these and other variations of the tobit model.
= Type I
=The tobit model is a special case of a censored regression model, because the latent variable
y
i
∗
{\displaystyle y_{i}^{*}}
cannot always be observed while the independent variable
x
i
{\displaystyle x_{i}}
is observable. A common variation of the tobit model is censoring at a value
y
L
{\displaystyle y_{L}}
different from zero:
y
i
=
{
y
i
∗
if
y
i
∗
>
y
L
,
y
L
if
y
i
∗
≤
y
L
.
{\displaystyle y_{i}={\begin{cases}y_{i}^{*}&{\text{if }}y_{i}^{*}>y_{L},\\y_{L}&{\text{if }}y_{i}^{*}\leq y_{L}.\end{cases}}}
Another example is censoring of values above
y
U
{\displaystyle y_{U}}
.
y
i
=
{
y
i
∗
if
y
i
∗
<
y
U
,
y
U
if
y
i
∗
≥
y
U
.
{\displaystyle y_{i}={\begin{cases}y_{i}^{*}&{\text{if }}y_{i}^{*}
Yet another model results when
y
i
{\displaystyle y_{i}}
is censored from above and below at the same time.
y
i
=
{
y
i
∗
if
y
L
<
y
i
∗
<
y
U
,
y
L
if
y
i
∗
≤
y
L
,
y
U
if
y
i
∗
≥
y
U
.
{\displaystyle y_{i}={\begin{cases}y_{i}^{*}&{\text{if }}y_{L}
The rest of the models will be presented as being bounded from below at 0, though this can be generalized as done for Type I.
= Type II
=Type II tobit models introduce a second latent variable.
y
2
i
=
{
y
2
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{2i}={\begin{cases}y_{2i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
In Type I tobit, the latent variable absorbs both the process of participation and the outcome of interest. Type II tobit allows the process of participation (selection) and the outcome of interest to be independent, conditional on observable data.
The Heckman selection model falls into the Type II tobit, which is sometimes called Heckit after James Heckman.
= Type III
=Type III introduces a second observed dependent variable.
y
1
i
=
{
y
1
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{1i}={\begin{cases}y_{1i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
y
2
i
=
{
y
2
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{2i}={\begin{cases}y_{2i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
The Heckman model falls into this type.
= Type IV
=Type IV introduces a third observed dependent variable and a third latent variable.
y
1
i
=
{
y
1
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{1i}={\begin{cases}y_{1i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
y
2
i
=
{
y
2
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{2i}={\begin{cases}y_{2i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
y
3
i
=
{
y
3
i
∗
if
y
1
i
∗
≤
0
,
0
if
y
1
i
∗
<
0.
{\displaystyle y_{3i}={\begin{cases}y_{3i}^{*}&{\text{if }}y_{1i}^{*}\leq 0,\\0&{\text{if }}y_{1i}^{*}<0.\end{cases}}}
= Type V
=Similar to Type II, in Type V only the sign of
y
1
i
∗
{\displaystyle y_{1i}^{*}}
is observed.
y
2
i
=
{
y
2
i
∗
if
y
1
i
∗
>
0
,
0
if
y
1
i
∗
≤
0.
{\displaystyle y_{2i}={\begin{cases}y_{2i}^{*}&{\text{if }}y_{1i}^{*}>0,\\0&{\text{if }}y_{1i}^{*}\leq 0.\end{cases}}}
y
3
i
=
{
y
3
i
∗
if
y
1
i
∗
≤
0
,
0
if
y
1
i
∗
>
0.
{\displaystyle y_{3i}={\begin{cases}y_{3i}^{*}&{\text{if }}y_{1i}^{*}\leq 0,\\0&{\text{if }}y_{1i}^{*}>0.\end{cases}}}
= Non-parametric version
=If the underlying latent variable
y
i
∗
{\displaystyle y_{i}^{*}}
is not normally distributed, one must use quantiles instead of moments to analyze the
observable variable
y
i
{\displaystyle y_{i}}
. Powell's CLAD estimator offers a possible way to achieve this.
Applications
Tobit models have, for example, been applied to estimate factors that impact grant receipt, including financial transfers distributed to sub-national governments who may apply for these grants. In these cases, grant recipients cannot receive negative amounts, and the data is thus left-censored. For instance, Dahlberg and Johansson (2002) analyse a sample of 115 municipalities (42 of which received a grant). Dubois and Fattore (2011) use a tobit model to investigate the role of various factors in European Union fund receipt by applying Polish sub-national governments. The data may however be left-censored at a point higher than zero, with the risk of mis-specification. Both studies apply Probit and other models to check for robustness. Tobit models have also been applied in demand analysis to accommodate observations with zero expenditures on some goods. In a related application of tobit models, a system of nonlinear tobit regressions models has been used to jointly estimate a brand demand system with homoscedastic, heteroscedastic and generalized heteroscedastic variants.
See also
Truncated normal hurdle model
Limited dependent variable
Rectifier (neural networks)
Truncated regression model
Dynamic unobserved effects model § Censored dependent variable
Probit model, the name tobit is a pun on both Tobin, their creator, and their similarities to probit models.
Notes
References
Further reading
Amemiya, Takeshi (1985). "Tobit Models". Advanced Econometrics. Oxford: Basil Blackwell. pp. 360–411. ISBN 0-631-13345-3.
Breen, Richard (1996). "The Tobit Model for Censored Data". Regression Models : Censored, Samples Selected, or Truncated Data. Thousand Oaks: Sage. pp. 12–33. ISBN 0-8039-5710-6.
Gouriéroux, Christian (2000). "The Tobit Model". Econometrics of Qualitative Dependent Variables. New York: Cambridge University Press. pp. 170–207. ISBN 0-521-58985-1.
King, Gary (1989). "Models with Nonrandom Selection". Unifying Political Methodology : the Likehood Theory of Statistical Inference. Cambridge University Press. pp. 208–230. ISBN 0-521-36697-6.
Maddala, G. S. (1983). "Censored and Truncated Regression Models". Limited-Dependent and Qualitative Variables in Econometrics. New York: Cambridge University Press. pp. 149–196. ISBN 0-521-24143-X.
Kata Kunci Pencarian:
- Fungsi tanjakan
- Kanon Alkitab
- Kitab Amsal
- 2 Esdras
- Kepengarangan Alkitab
- Kitab Makabe yang Pertama
- Kitab Makabe yang Kedua
- Tobit model
- Tobit
- Probit model
- Censored regression model
- James Tobin
- Ramp function
- Truncated normal hurdle model
- Heckman correction
- Rectifier (neural networks)
- Censoring (statistics)