Dealing with the glmfit Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.

Question

asaf benjamin 2022년 4월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1689375-dealing-with-the-glmfit-warning-the-estimated-coefficients-perfectly-separate-failures-from-success

댓글: Walter Roberson 2023년 7월 18일

My data has 22 variables and 76 observations, 44 of which are "positive", 32 "negative". I'm interested in computing 95% confidence intervals of the logistic regression model coefficients. However, running

fitglm(data, 'Distribution', 'binomial');

throws the following warning:

Warning: Iteration limit reached.

> In glmfit (line 340)

In GeneralizedLinearModel/fitter (line 659)

In classreg.regr/FitObject/doFit (line 94)

In GeneralizedLinearModel.fit (line 973)

In fitglm (line 146)

In logRegOTRKO (line 2)

Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite. For the fitted

linear combination XB of the predictors, the sample proportions P of Y=N in the data satisfy:

XB<1.12299: P=0

XB>1.12299: P=1

> In glmfit>diagnoseSeparation (line 560)

In glmfit (line 346)

In GeneralizedLinearModel/fitter (line 659)

In classreg.regr/FitObject/doFit (line 94)

In GeneralizedLinearModel.fit (line 973)

In fitglm (line 146)

In logRegOTRKO (line 2)

and the resulting coefficients have SE's that are about an order of magnitude larger than the coefficients themselves, and p-values close to one, although I know that many of the independent variables are significantly different between postivie and negative classes.

I've seen similar questions (e.g. here, here and here), but none of them seem to provide a suitable solution. I've also seen suggestions using R and Python, but I'd be happy to keep this a "Matlab-only" project. Thanks!

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

Daniel K 2023년 7월 18일

I'm getting the same error right now, but I don't really understand what the warning

"Warning: The estimated coefficients perfectly separate failures from successes."

means. is there an more understandable explanation anywhere?

Walter Roberson 2023년 7월 18일

The message about perfect separation means that there is no noise and the data can be exactly fit by a model with the given number of predictors. When you have a relatively high number of predictors compared to the sample size, it becomes more likely that a simple model can exactly predict the data.

Now suppose you had a goodness measure that involved dividing by the number of values not exactly predicted, but that the number not exactly fit by the model was 0, then you would in that case be calculating something divided by 0, which would not give a finite result.

You probably either need a lot more data, or else need a simpler model (fewer predictors) so that the predictions are no longer exact.

... but from time to time the implied meaning is that your system is so predictable that you do not need to use those kind of tools. Or it might mean that you didn't stress-test the system enough and it is well behaved in the parts you tested.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

asaf benjamin 2022년 4월 6일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1689375-dealing-with-the-glmfit-warning-the-estimated-coefficients-perfectly-separate-failures-from-success#answer_936110

편집: asaf benjamin 2022년 4월 6일

One approach I've found useful is to get the coefficients using a different function (e.g. fitclinear) which allows for regularization (I used ridge but I think lasso would work just as well), and then compute (bootstrapped) confidence intervals for the coefficients using bootci. Of course this is only feasible for small/medium datasets, as it requires fitting the model numerous times using bootci (took like 2 min to fit 1e4 BS samples from my data on my machine).

Happy to see what people think of this approach...

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Dealing with the glmfit Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Dealing with the glmfit Warning: The estimated coefficients perfectly separate failures from successes. This means the theoretical best estimates are not finite.

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기