Probit: removing groups that perfectly predict failures

조회 수: 5 (최근 30일)
Tian
Tian 2021년 7월 29일
답변: Kumar Pallav 2021년 8월 4일
Hi all,
I have a group-year panel data as attahced. Apologies the data is very low quality.
There are 3 groups, each has 20 observations. Outcome y is a dummy variable for success. The first column in x is a continuous variable "effort". The second column is a dummy indicates group A. The third column is a dummy for group B. There is no dummy for group C to avoid collinearity.
I want to predict the probability of success using the probit model. The code I try is:
b = glmfit(x,y,'binomial','Link','probit');
b =
0.1857 (constant)
-1.8149 (effort)
-16.1148 (group A)
-16.2994 (group B)
As you can see in the data, all outcomes for group A are failures. So the second column in x predicts y == 0 perfectly. Matlab also raises a warning:
Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.
For the fitted linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:
XB<-0.834093: P=0
XB=-0.834093: P=1
XB>-0.834093: P=0
However, it still returns an estimated coefficient for group A dummy, which is b(3) = -16.1148.
Question:
Since x(:,2) perfectly predict failures, b(3) should be 0. Is there an option in glmfit to remove observations for group A within glmfit function, then return the coefficient as 0 for this column? So I can get something like:
b =
0.1857 (constant)
-1.8149 (effort)
0 (group A)
xxx (group B)
Stata does this automatically using the command:
probit y effort i.group
It turns out the estiamtes for the constant and effort are the same. So the perfect failure issue only affects the group dummies coefficients...
Thank you!!!

채택된 답변

Kumar Pallav
Kumar Pallav 2021년 8월 4일
From my understanding ,for the coefficient vector b, you expect the b(3)=0 as you mentioned that the second column of x (group A dummies) are failures(that is 0). But , after inspecting the data, I see that the second column of x are not all zeros.
%check if any non-zero value in the vector
containsNonZero = any(x(:,2)) %returns 1 if true
However, if you change the values of second column of x to zero
%change second column values of x to zero
x(:,2)=0;
b = glmfit(x,y,'binomial','Link','probit')
Then, the b(3) value becomes 0.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Regression에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by