why does 'fitmnr' produce wrong CoefficientNames?

조회 수: 8 (최근 30일)
Antoine
Antoine 2024년 4월 1일
편집: Antoine 2024년 4월 10일
I have a 2649x4 table T looking like this
animalID Region f_n shiftDir
________ _______ _____ ________
{'1-3'} {'CA1'} {'f'} {'B'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'F'}
{'1-3'} {'CA1'} {'f'} {'B'}
{'1-3'} {'CA1'} {'f'} {'N'}
{'1-3'} {'CA1'} {'f'} {'B'}
...
All variables are categorical. AnimalID has 11 unique categories, Region has 2 unique categories (CA1 or CA3), f_n has 2 unique categories (f or n) and shiftDir has 3 categories ('B', 'F', 'N'). I wish to perform a multinomial logistic regression where 'shiftDir' is the response variable. When using the function fitmnr introduced last year, the coefficient names in the output are wrong, as well as the number of predictor.
For instance, if I do
mlr2 = fitmnr(T,'shiftDir ~ Region + f_n + animalID', CategoricalPredictors='all')
I end up with the following output:
mlr2 =
Multinomial regression with nominal responses
Value SE tStat pValue
_________ __________ ___________ __________
(Intercept_B) 1.2863 0.11896 10.812 3.0111e-27
animalID_cdc_B 0.5015 0.48301 1.0383 0.29913
animalID_cfc_B -0.15888 0.35949 -0.44197 0.65851
animalID_wt1_B -1.3369 0.36385 -3.6742 0.00023859
animalID_4-2_B -0.85465 5.5979e+06 -1.5267e-07 1
animalID_4-1_B -2.0389 5.5979e+06 -3.6423e-07 1
animalID_5-3_B -0.28241 5.5979e+06 -5.045e-08 1
animalID_5-1_B 0.58524 5.5979e+06 1.0455e-07 1
animalID_5-4_B 0.042137 5.5979e+06 7.5274e-09 1
animalID_7-2_B -0.73503 5.5979e+06 -1.3131e-07 1
animalID_9-1_B 0.93652 5.5979e+06 1.673e-07 1
Region_CA3_B -0.40496 5.5979e+06 -7.2342e-08 1
f_n_n_B 1.292 0.17441 7.4078 1.284e-13
(Intercept_F) 1.8246 0.11331 16.102 2.4575e-58
animalID_cdc_F 0.49841 0.47571 1.0477 0.29477
animalID_cfc_F -0.075916 0.35126 -0.21613 0.82889
animalID_wt1_F -0.25259 0.30712 -0.82246 0.41081
animalID_4-2_F 13.344 5.9316e+06 2.2496e-06 1
animalID_4-1_F 11.524 5.9316e+06 1.9427e-06 1
animalID_5-3_F 13.386 5.9316e+06 2.2567e-06 1
animalID_5-1_F 13.784 5.9316e+06 2.3238e-06 1
animalID_5-4_F 14.044 5.9316e+06 2.3676e-06 1
animalID_7-2_F 13.099 5.9316e+06 2.2083e-06 1
animalID_9-1_F 14.029 5.9316e+06 2.3651e-06 1
Region_CA3_F -13.119 5.9316e+06 -2.2117e-06 1
f_n_n_F 0.65204 0.16938 3.8495 0.00011834
2649 observations, 5272 error degrees of freedom
Dispersion: 1
Chi^2-statistic vs. constant model: 223.1291, p-value = 3.2688e-34
I am not very familiar with multinomial logistic regressions, but I believe the predictors should actually be less numerous and of the form "animalID_B, Region_B, f_n_B, ... animalID_F, Region_F, f_n_F". I am not sure why it appends the names of the categories inside the predictor variables (e.g. the names of the animals), thus creating too many predictors.
Note that I also get some warning when running the regression:
Warning: Maximum likelihood estimation did not converge. Iteration limit
exceeded. You may need to merge categories to increase observed counts.
> In mnrfit>nominalFit (line 570)
In mnrfit (line 246)
In MultinomialRegression/fitter (line 317)
In classreg.regr/FitObject/doFit (line 94)
In MultinomialRegression.fit (line 672)
In fitmnr (line 121)
This message might be unrelated, because I don't get that error when I omit the animalID predictor variable, but I still get wrong CoefficientNames, and I suspect a wrong output altogehter.
I wonder if this is due to the data type inside my table. Any feedback would appreciated.
Thank you

답변 (1개)

Avadhoot
Avadhoot 2024년 4월 10일
From your question I see that you are using the "fitmnr" method with the "CategoricalPredictors='all'" input parameter. The issue with the names that you are facing is due to this argument. When you specify CategoricalPredictors='all', MATLAB treats each level of your categorical predictors as separate entities. This is why you see coefficients for each category of "animalID" (and other variables) for each level of your response variable "shiftDir" (except the reference category, which is implicitly set to 0). This is expected behavior for categorical variables in regression models, including multinomial logistic regression.
The coefficient names are also consistent with the MATLAB naming formant which is predictorName_levelName_responseLevel. This indicates how each level of a predictor influences the log-odds of being in a particular category of the response variable, relative to a reference category.
The warning about convergence that you are seeing is because the model is not fitting too well. This is because the number of parameters is too high. The warning disappears when you exclude "animalID" because it reduces the number of parameters considerably and thus the model is simpler and easier to fit.
A solution to your problem would be to try to simplify the model. Consider whether all predictors are necessary or if some can be omitted. Also consider reducing the number of levels in the categorical variables or combining categories if possible.
For more information on fitmnr function refer to the below documentation:
I hope this helps.
  댓글 수: 1
Antoine
Antoine 2024년 4월 10일
편집: Antoine 2024년 4월 10일
Thank you for your response @Avadhoot!
I understand Matlab's naming system now. The issue, I think, was that the documentation does not have an example with categorical predictors, just numeric continuous ones, for which the naming system is predictorName_responseLevel.
Is there a way for the model not to consider the categories of the predictor variable as "levels", i.e. in a way more akin to what it does with continuous numeric variables? I suppose I could convert my categories to numbers, but the numbering system would be arbitrary and the chosen numbers will likely influence the result of the model, won't it?
My ultimate goal is not a predictive one but to evaluate whether a given predictor variable significantly influences the odds ratio of the response variable. Perhaps this is not the right test? I had initially combined the B and F categories of "shiftDir", and considered the proportion B+F/N as the response variable, and performed a 2-way ANOVA (mixed effect linear model) with "Region" and "f_n" as predictor variables and animalID as a random effect nested in "f_n". But this has issues of its own, and I thought a multinomial logistic regression would be more appropriate. But perhaps I was wrong...

댓글을 달려면 로그인하십시오.

제품


릴리스

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by