高级计量Stata--第十二章多值选择模型
Xwyturbo / 2022-11-15
以下内容全部搬运自陈强老师的《高级计量经济学及Stata应用》
12.1多项logit与多项Probit。
以数据集nomocc2.dta为例,进行多项logit与多项probit估计。首先加载数据,进行描述性统计分析。
## 加载数据
use "D:\cnusisters\cyl\20221108\nomocc2.dta"
(1982 General Social Survey)
## 查看数据特征
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
occ | 337 3.397626 1.367913 1 5
white | 337 .9169139 .2764227 0 1
ed | 337 13.09496 2.946427 3 20
exper | 337 20.50148 13.95936 2 66
## 查看前六行观测值
. list in 1/6
+-----------------------------+
| occ white ed exper |
|-----------------------------|
1. | Menial 1 11 3 |
2. | Menial 1 12 14 |
3. | Menial 1 12 44 |
4. | Menial 1 12 18 |
5. | Menial 0 14 24 |
|-----------------------------|
6. | Menial 1 13 38 |
+-----------------------------+
## 列表考察教育年限ed,与职业occ的关系。
### occ,即occupation职业,包括Menial服务人员,BlueCol蓝领,Craft工匠,WhiteCol白领,Prof专业人士五种。
table occ,contents(N ed mean ed sd ed)
----------------------------------------------
Occupatio |
n | N(ed) mean(ed) sd(ed)
----------+-----------------------------------
Menial | 31 11.774194 2.186469
BlueCol | 69 11.217391 2.571733
Craft | 84 11.964286 2.119596
WhiteCol | 41 13.170732 2.096455
Prof | 112 15.4375 2.608998
----------------------------------------------
构建多项logit模型,并进行豪斯曼检验,Small-Hsiao检验,最后搞个预测
## 构建多项logit模型
mlogit occ white ed exper, nolog
Multinomial logistic regression Number of obs = 337
LR chi2(12) = 166.09
Prob > chi2 = 0.0000
Log likelihood = -426.80048 Pseudo R2 = 0.1629
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Menial |
white | -1.774306 .7550543 -2.35 0.019 -3.254186 -.2944273
ed | -.7788519 .1146293 -6.79 0.000 -1.003521 -.5541826
exper | -.0356509 .018037 -1.98 0.048 -.0710028 -.000299
_cons | 11.51833 1.849356 6.23 0.000 7.893659 15.143
-------------+----------------------------------------------------------------
BlueCol |
white | -.5378027 .7996033 -0.67 0.501 -2.104996 1.029391
ed | -.8782767 .1005446 -8.74 0.000 -1.07534 -.6812128
exper | -.0309296 .0144086 -2.15 0.032 -.05917 -.0026893
_cons | 12.25956 1.668144 7.35 0.000 8.990061 15.52907
-------------+----------------------------------------------------------------
Craft |
white | -1.301963 .647416 -2.01 0.044 -2.570875 -.0330509
ed | -.6850365 .0892996 -7.67 0.000 -.8600605 -.5100126
exper | -.0079671 .0127055 -0.63 0.531 -.0328693 .0169351
_cons | 10.42698 1.517943 6.87 0.000 7.451864 13.40209
-------------+----------------------------------------------------------------
WhiteCol |
white | -.2029212 .8693072 -0.23 0.815 -1.906732 1.50089
ed | -.4256943 .0922192 -4.62 0.000 -.6064407 -.2449479
exper | -.001055 .0143582 -0.07 0.941 -.0291967 .0270866
_cons | 5.279722 1.684006 3.14 0.002 1.979132 8.580313
-------------+----------------------------------------------------------------
Prof | (base outcome)
------------------------------------------------------------------------------
.
## 没有安装sg155,无mlogtest命令,报错。
. mlogtest ,hausman base
command mlogtest is unrecognized
r(199);
## net安装失败
. net install sg155
file http://www.stata.com/sg155.pkg not found
could not load sg155.pkg from http://www.stata.com/
r(601);
## ssc安装失败
. ssc install sg155
ssc install: "sg155" not found at SSC, type search sg155
(To find all packages at SSC that start with s, type ssc describe s)
r(601);
## 通过search安装
. search sg155
## 安装成功后,检验IIA假定
. mlogtest ,hausman base
Problem determining number of categories.
**** Hausman tests of IIA assumption
Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.
You used the old syntax of hausman. Click here to learn about the new syntax.
(storing estimation results as _HAUSMAN)
Omitted | chi2 df P>chi2 evidence
---------+------------------------------------
Menial | 7.324 12 0.835 for Ho
BlueCol | 0.320 12 1.000 for Ho
Craft | -14.436 12 1.000 for Ho
WhiteCol | -5.541 11 1.000 for Ho
Prof | 15.566 12 0.212 for Ho
----------------------------------------------
## 由于某种原因,无法进行Small-Hsiao检验。
. mlogtest ,smhsiao base
Problem determining number of categories.
**** Small-Hsiao tests of IIA assumption
Ho: Odds(Outcome-J vs Outcome-K) are independent of other alternatives.
equation 5 not found
r(303);
## 豪斯曼检验,Small-Hsiao检验的小样本性质均不好,只具有参考价值;但是至少不违背IIA假定,如要显示相对分线比率,可输入如下命令。
. mlogit occ white ed exper ,nolog rrr
Multinomial logistic regression Number of obs = 337
LR chi2(12) = 166.09
Prob > chi2 = 0.0000
Log likelihood = -426.80048 Pseudo R2 = 0.1629
------------------------------------------------------------------------------
occ | RRR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Menial |
white | .169601 .128058 -2.35 0.019 .0386123 .7449581
ed | .4589326 .0526071 -6.79 0.000 .3665863 .5745417
exper | .9649771 .0174053 -1.98 0.048 .9314593 .999701
_cons | 100542 185937.9 6.23 0.000 2680.234 3771571
-------------+----------------------------------------------------------------
BlueCol |
white | .5840301 .4669924 -0.67 0.501 .1218461 2.79936
ed | .4154983 .0417761 -8.74 0.000 .3411816 .5060029
exper | .9695438 .0139698 -2.15 0.032 .9425465 .9973143
_cons | 210989.6 351961.2 7.35 0.000 8022.948 5548662
-------------+----------------------------------------------------------------
Craft |
white | .2719974 .1760954 -2.01 0.044 .0764686 .9674893
ed | .5040718 .0450134 -7.67 0.000 .4231365 .600488
exper | .9920646 .0126046 -0.63 0.531 .967665 1.017079
_cons | 33758.18 51243 6.87 0.000 1723.071 661385.6
-------------+----------------------------------------------------------------
WhiteCol |
white | .8163426 .7096525 -0.23 0.815 .1485651 4.485678
ed | .653316 .0602483 -4.62 0.000 .5452883 .7827453
exper | .9989455 .0143431 -0.07 0.941 .9712254 1.027457
_cons | 196.3154 330.5962 3.14 0.002 7.236457 5325.773
-------------+----------------------------------------------------------------
Prof | (base outcome)
------------------------------------------------------------------------------
Note: _cons estimates baseline relative risk for each outcome.
## 根据模型预测个体选择各种职业的可能性,分别记录为occ1,occ2,occ3,occ4,occ5,并显示前五个观测值。
. predict occ1 occ2 occ3 occ4 occ5
(option pr assumed; predicted probabilities)
. list occ1-occ5 in 1/5
+------------------------------------------------------+
| occ1 occ2 occ3 occ4 occ5 |
|------------------------------------------------------|
1. | .1681295 .4128002 .2760952 .085288 .0576871 |
2. | .1257816 .2945018 .3076293 .1328948 .1391926 |
3. | .0644456 .1738508 .3616529 .1922331 .2078175 |
4. | .1161744 .2771936 .3174044 .1409616 .148266 |
5. | .1691383 .0988214 .410424 .1063373 .215279 |
+------------------------------------------------------+
## 也可以选择其他职业作为替代方案,比如服务业。
. mlogit occ white ed exper, base(1) nolog
Multinomial logistic regression Number of obs = 337
LR chi2(12) = 166.09
Prob > chi2 = 0.0000
Log likelihood = -426.80048 Pseudo R2 = 0.1629
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Menial | (base outcome)
-------------+----------------------------------------------------------------
BlueCol |
white | 1.236504 .7244352 1.71 0.088 -.1833631 2.656371
ed | -.0994247 .1022812 -0.97 0.331 -.2998922 .1010428
exper | .0047212 .0173984 0.27 0.786 -.0293789 .0388214
_cons | .7412336 1.51954 0.49 0.626 -2.23701 3.719477
-------------+----------------------------------------------------------------
Craft |
white | .4723436 .6043097 0.78 0.434 -.7120817 1.656769
ed | .0938154 .097555 0.96 0.336 -.0973888 .2850197
exper | .0276838 .0166737 1.66 0.097 -.004996 .0603636
_cons | -1.091353 1.450218 -0.75 0.452 -3.933728 1.751022
-------------+----------------------------------------------------------------
WhiteCol |
white | 1.571385 .9027216 1.74 0.082 -.1979166 3.340687
ed | .3531577 .1172786 3.01 0.003 .1232959 .5830194
exper | .0345959 .0188294 1.84 0.066 -.002309 .0715007
_cons | -6.238608 1.899094 -3.29 0.001 -9.960764 -2.516453
-------------+----------------------------------------------------------------
Prof |
white | 1.774306 .7550543 2.35 0.019 .2944273 3.254186
ed | .7788519 .1146293 6.79 0.000 .5541826 1.003521
exper | .0356509 .018037 1.98 0.048 .000299 .0710028
_cons | -11.51833 1.849356 -6.23 0.000 -15.143 -7.893659
------------------------------------------------------------------------------
构建多项probit模型,结合多项logit,二者系数虽有不同,但不具备可比性,具有可比性的是两个模型的预测概率,由此计算多项probit模型的各种预测概率,分别记录为occ1p,occ2p,occ3p,occ4p,occ5p
## 构建多项probit模型
. mprobit occ white ed exper ,nolog
Multinomial probit regression Number of obs = 337
Wald chi2(12) = 105.61
Log likelihood = -429.31856 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
occ | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Menial |
white | -1.144907 .5027501 -2.28 0.023 -2.130279 -.1595352
ed | -.5094985 .0698816 -7.29 0.000 -.6464639 -.3725331
exper | -.0234636 .0109546 -2.14 0.032 -.0449343 -.0019929
_cons | 7.46242 1.145854 6.51 0.000 5.216587 9.708253
-------------+----------------------------------------------------------------
BlueCol |
white | -.392222 .5182974 -0.76 0.449 -1.408066 .6236222
ed | -.5845723 .063011 -9.28 0.000 -.7080715 -.461073
exper | -.0225903 .00984 -2.30 0.022 -.0418764 -.0033042
_cons | 8.188586 1.069264 7.66 0.000 6.092867 10.28431
-------------+----------------------------------------------------------------
Craft |
white | -.8903573 .457069 -1.95 0.051 -1.786196 .0054814
ed | -.4718874 .0579237 -8.15 0.000 -.5854157 -.3583591
exper | -.0077824 .0090923 -0.86 0.392 -.025603 .0100382
_cons | 7.140264 .9896954 7.21 0.000 5.200496 9.080031
-------------+----------------------------------------------------------------
WhiteCol |
white | -.1434167 .5530156 -0.26 0.795 -1.227307 .9404739
ed | -.3038566 .0576254 -5.27 0.000 -.4168003 -.1909129
exper | -.0039043 .0095574 -0.41 0.683 -.0226365 .0148279
_cons | 3.76544 1.036649 3.63 0.000 1.733645 5.797234
-------------+----------------------------------------------------------------
Prof | (base outcome)
------------------------------------------------------------------------------
## 计算多项probit模型的各种预测概率,分别记录为occ1p,occ2p,occ3p,occ4p,occ5p。
. predict occ1p occ2p occ3p occ4p occ5p
(option pr assumed; predicted probabilities)
## 比较多项logit 与 多项probit 职业预测概率的相关性。
. corr occ1 occ1p
(obs=337)
| occ1 occ1p
-------------+------------------
occ1 | 1.0000
occ1p | 0.9979 1.0000
. corr occ2 occ2p
(obs=337)
| occ2 occ2p
-------------+------------------
occ2 | 1.0000
occ2p | 0.9985 1.0000
. corr occ3 occ3p
(obs=337)
| occ3 occ3p
-------------+------------------
occ3 | 1.0000
occ3p | 0.9935 1.0000
. corr occ4 occ4p
(obs=337)
| occ4 occ4p
-------------+------------------
occ4 | 1.0000
occ4p | 0.9929 1.0000
. corr occ5 occ5p
(obs=337)
| occ5 occ5p
-------------+------------------
occ5 | 1.0000
occ5p | 0.9989 1.0000
## 上述结果显示两个模型的预测概率高度一致,相关系数均在0.99以上,也就是说随便那个模型都可以,只是多项probit耗时不好解释,且多项logit可以从几率比角度解释系数估计值,故实践中采用多项logit。
12.2条件logit模型
以travel2.dta为例,进行条件logit模型估计
## 加载数据
. use "D:\cnusisters\cyl\20221108\travel2.dta"
(Greene & Hensher 1997 data on travel mode choice)
## 查看前六行观测值
## 被解释变量choice为虚拟变量,表示选择哪一种方案,如第一行选择了train方案,choice=0,第三行选择了car方案,choice=1。
## 数据只有152个旅行团,相对是个宽表,每个旅行团对应三行数据,但是为了模型分析,转成了长表152*3。
. list id mode train bus time invc choice hinc psize in 1/6 ,sepby(id)
+----------------------------------------------------------------+
| id mode train bus time invc choice hinc psize |
|----------------------------------------------------------------|
1. | 1 Train 1 0 406 31 0 35 1 |
2. | 1 Bus 0 1 452 25 0 35 1 |
3. | 1 Car 0 0 180 10 1 35 1 |
|----------------------------------------------------------------|
4. | 2 Train 1 0 398 31 0 30 2 |
5. | 2 Bus 0 1 452 25 0 30 2 |
6. | 2 Car 0 0 255 11 1 30 2 |
+----------------------------------------------------------------+
## 查看变量分布特征,
## id指旅行团,但是编码不连续最大值到了210,
## mode指旅行方式(Train、bus、car),虚拟变量train=1,该行对应火车方案;虚拟变量bus=1,对应公交车方案;虚拟变量train=0,bus=0,对应自驾车方案。
## time指旅行时间,单位分钟。
## invc 乘车成本
## hinc 家庭收入
## psize 旅行团人数
. sum
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
id | 456 103.8882 61.03044 1 210
mode | 456 2 .8173933 1 3
train | 456 .3333333 .4719223 0 1
bus | 456 .3333333 .4719223 0 1
car | 456 .3333333 .4719223 0 1
-------------+---------------------------------------------------------
time | 456 632.1096 270.2547 180 1440
invc | 456 33.95175 21.795 2 109
choice | 456 .3333333 .4719223 0 1
ttme | 456 25.24781 21.15744 0 99
invt | 456 606.8618 265.8235 180 1440
-------------+---------------------------------------------------------
gc | 456 113.4912 53.7725 30 269
hinc | 456 31.80921 19.25813 2 72
psize | 456 1.809211 1.069457 1 6
## 进行条件logit模型估计,加上选择项 or 计算风险比率
. clogit choice train bus time invc ,group(id) nolog or
Conditional (fixed-effects) logistic regression
Number of obs = 456
LR chi2(4) = 172.06
Prob > chi2 = 0.0000
Log likelihood = -80.961135 Pseudo R2 = 0.5152
------------------------------------------------------------------------------
choice | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
train | 14.45786 6.551738 5.89 0.000 5.94802 35.14272
bus | 4.359401 1.746879 3.67 0.000 1.987639 9.561286
time | .9810368 .0024044 -7.81 0.000 .9763356 .9857607
invc | .9529758 .0113896 -4.03 0.000 .930912 .9755624
------------------------------------------------------------------------------
## 计算条件logit的预测概率
. predict prob
(option pc1 assumed; probability of success given one success within group)
## 展示第一个旅行社团的预测结果,结果显示第一个旅行团选择car的概率为0.93,实际选择了car方案。
. list id mode prob choice time invc in 1/3
+----------------------------------------------+
| id mode prob choice time invc |
|----------------------------------------------|
1. | 1 Train .0642477 0 406 31 |
2. | 1 Bus .0107205 0 452 25 |
3. | 1 Car .9250318 1 180 10 |
+----------------------------------------------+
## 使用asclogit估计条件logit模型
. asclogit choice time invc ,case(id) alternatives(mode) base(3) nolog
Alternative-specific conditional logit Number of obs = 456
Case variable: id Number of cases = 152
Alternative variable: mode Alts per case: min = 3
avg = 3.0
max = 3
Wald chi2(2) = 70.53
Log likelihood = -80.961135 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
choice | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mode |
time | .9810368 .0024044 -7.81 0.000 .9763356 .9857607
invc | .9529758 .0113896 -4.03 0.000 .930912 .9755624
-------------+----------------------------------------------------------------
Train |
_cons | 14.45786 6.551738 5.89 0.000 5.94802 35.14272
-------------+----------------------------------------------------------------
Bus |
_cons | 4.359401 1.746879 3.67 0.000 1.987639 9.561286
-------------+----------------------------------------------------------------
Car | (base alternative)
------------------------------------------------------------------------------
Note: _cons estimates baseline odds for each outcome.
## 通过asclogit,将个体变量,以casevars(hinc psize)形式添加进入方程。
. asclogit choice time invc ,case(id) alternatives(mode) base(3) casevars(hinc psize) nolog
Alternative-specific conditional logit Number of obs = 456
Case variable: id Number of cases = 152
Alternative variable: mode Alts per case: min = 3
avg = 3.0
max = 3
Wald chi2(6) = 69.09
Log likelihood = -77.504846 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mode |
time | -.0185035 .0025035 -7.39 0.000 -.0234103 -.0135966
invc | -.0402791 .0134851 -2.99 0.003 -.0667095 -.0138488
-------------+----------------------------------------------------------------
Train |
hinc | -.0342841 .0158471 -2.16 0.031 -.0653438 -.0032243
psize | -.0038421 .3098075 -0.01 0.990 -.6110537 .6033695
_cons | 3.499641 .7579665 4.62 0.000 2.014054 4.985228
-------------+----------------------------------------------------------------
Bus |
hinc | -.0080174 .0200322 -0.40 0.689 -.0472798 .031245
psize | -.5141037 .4007015 -1.28 0.199 -1.299464 .2712569
_cons | 2.486465 .8803649 2.82 0.005 .7609815 4.211949
-------------+----------------------------------------------------------------
Car | (base alternative)
------------------------------------------------------------------------------
## 通过asclogit,估计一个只含常数项的方程。
. asclogit choice ,case(id) alternatives(mode) base(3) nolog
Alternative-specific conditional logit Number of obs = 456
Case variable: id Number of cases = 152
Alternative variable: mode Alts per case: min = 3
avg = 3.0
max = 3
Wald chi2(0) = .
Log likelihood = -160.00172 Prob > chi2 = .
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Train |
_cons | .0655973 .1811689 0.36 0.717 -.2894872 .4206818
-------------+----------------------------------------------------------------
Bus |
_cons | -.67634 .2242376 -3.02 0.003 -1.115838 -.2368425
-------------+----------------------------------------------------------------
Car | (base alternative)
------------------------------------------------------------------------------
## 通过两个方程的对数似然函数,计算准R方。
. dis(160.00172-77.504846)/160.00172
.51559992
12.3混合logit模型
以travel2.dta为例,进行混合logit模型,略。
12.4嵌套logit模型
进行嵌套logit模型的命令语法
nlogit y x1 x2||level_equation||level2_equation,base(#) case(varname) nolog notree
其中x1 x2,是随方案而变的随机变量。
||表示分隔符。
level1_equation用来指定系数只随着树干(level1),不随随树枝(level2)而变的解释变量。
level2_equation用来指定系数既随着树干(level1),又随树枝(level2)而变的解释变量。
case(varname)用来指定个体。
base(#)指定参照方案。
## 定义树形结构,| 表示的是或的意思。
. nlogitgen type =mode(public:Train | Bus , private:Car)
new variable type is generated with 2 groups
label list lb_type
lb_type:
1 public
2 private
## 显示树形结构
. nlogittree mode type
tree structure specified for the nested logit model
type N mode N
---------------------------
public 304 --- Train 152
+- Bus 152
private 152 --- Car 152
---------------------------
total 456
N = number of observations at each level
## 通过前六行观测值,观察type。
. list id mode train bus time invc choice hinc type in 1/6, sepby(id)
+------------------------------------------------------------------+
| id mode train bus time invc choice hinc type |
|------------------------------------------------------------------|
1. | 1 Train 1 0 406 31 0 35 public |
2. | 1 Bus 0 1 452 25 0 35 public |
3. | 1 Car 0 0 180 10 1 35 private |
|------------------------------------------------------------------|
4. | 2 Train 1 0 398 31 0 30 public |
5. | 2 Bus 0 1 452 25 0 30 public |
6. | 2 Car 0 0 255 11 1 30 private |
+------------------------------------------------------------------+
## 嵌套logit模型估计
## nlogit y x1 x2||level1equation||level2_equation,base(#) case(varname) nolog notree##
#######################################################################################
. nlogit choice time invc || type:psize,base(private)||mode:hinc,base(3) case(id) nolog notree
note: branch 2 of level 1 is degenerate and the associated dissimilarity parameter ([private_tau]_cons) is not
defined; see help nlogit for details
RUM-consistent nested logit regression Number of obs = 456
Case variable: id Number of cases = 152
Alternative variable: mode Alts per case: min = 3
avg = 3.0
max = 3
Wald chi2(5) = 40.43
Log likelihood = -71.596852 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
choice | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
mode |
time | -.0131088 .0025114 -5.22 0.000 -.0180311 -.0081865
invc | -.0239481 .0092615 -2.59 0.010 -.0421003 -.0057959
------------------------------------------------------------------------------
type equations
------------------------------------------------------------------------------
public |
psize | -.2313238 .247549 -0.93 0.350 -.716511 .2538633
-------------+----------------------------------------------------------------
private |
psize | 0 (base)
------------------------------------------------------------------------------
mode equations
------------------------------------------------------------------------------
Train |
hinc | -.035748 .0138416 -2.58 0.010 -.062877 -.008619
_cons | 3.301226 .6493563 5.08 0.000 2.028511 4.573941
-------------+----------------------------------------------------------------
Bus |
hinc | -.0192798 .0150931 -1.28 0.201 -.0488616 .010302
_cons | 2.431362 .6584554 3.69 0.000 1.140813 3.721911
-------------+----------------------------------------------------------------
Car |
hinc | 0 (base)
_cons | 0 (base)
------------------------------------------------------------------------------
dissimilarity parameters
------------------------------------------------------------------------------
type |
/public_tau | .3270861 .1057724 .1197761 .5343961
/private_tau | 1 329241.4 -645300.4 645302.4
------------------------------------------------------------------------------
LR test for IIA (tau=1): chi2(2) = 13.95 Prob > chi2 = 0.0009