Probit: removing groups that perfectly predict failures
조회 수: 5 (최근 30일)
이전 댓글 표시
Hi all,
I have a group-year panel data as attahced. Apologies the data is very low quality.
There are 3 groups, each has 20 observations. Outcome y is a dummy variable for success. The first column in x is a continuous variable "effort". The second column is a dummy indicates group A. The third column is a dummy for group B. There is no dummy for group C to avoid collinearity.
I want to predict the probability of success using the probit model. The code I try is:
b = glmfit(x,y,'binomial','Link','probit');
b =
0.1857 (constant)
-1.8149 (effort)
-16.1148 (group A)
-16.2994 (group B)
As you can see in the data, all outcomes for group A are failures. So the second column in x predicts y == 0 perfectly. Matlab also raises a warning:
Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.
For the fitted linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:
XB<-0.834093: P=0
XB=-0.834093: P=1
XB>-0.834093: P=0
However, it still returns an estimated coefficient for group A dummy, which is b(3) = -16.1148.
Question:
Since x(:,2) perfectly predict failures, b(3) should be 0. Is there an option in glmfit to remove observations for group A within glmfit function, then return the coefficient as 0 for this column? So I can get something like:
b =
0.1857 (constant)
-1.8149 (effort)
0 (group A)
xxx (group B)
Stata does this automatically using the command:
probit y effort i.group
It turns out the estiamtes for the constant and effort are the same. So the perfect failure issue only affects the group dummies coefficients...
Thank you!!!
댓글 수: 0
채택된 답변
Kumar Pallav
2021년 8월 4일
From my understanding ,for the coefficient vector b, you expect the b(3)=0 as you mentioned that the second column of x (group A dummies) are failures(that is 0). But , after inspecting the data, I see that the second column of x are not all zeros.
%check if any non-zero value in the vector
containsNonZero = any(x(:,2)) %returns 1 if true
However, if you change the values of second column of x to zero
%change second column values of x to zero
x(:,2)=0;
b = glmfit(x,y,'binomial','Link','probit')
Then, the b(3) value becomes 0.
댓글 수: 0
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Regression에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!