The problem occurred when interactions were included in the model. Setting 'DummyVasCoding' to 'effects' to create dummy variables using effects coding solved the problem.
Inconsistent results with fitlme
조회 수: 11 (최근 30일)
이전 댓글 표시
I am trying to analyse some data with linear mixed-effects models using the function 'fitlme', but I noticed that I obtain different results depending on the alphabetical order of a categorical grouping variable. Could you let me know if I am doing something wrong?
I first create a table with the dataset:
grouping_variable1 = categorical(grouping_variable1);
grouping_variable2 = categorical(grouping_variable2);
ID = categorical(ID);
tbl1 = table(ID,grouping_variable1,grouping_variable2,dependent_variable);
I then create the linear mixed-effects model with the following line:
lme = fitlme(tbl1,'dependent_variable ~ grouping_variable1*grouping_variable2 + (1|ID)');
These are the two results I get deppending on the alphabetical order of grouping_variable2/4:
Option 1
grouping_variable2 = string(zeros(23*2,1));
grouping_variable2(1:23) = 'Training';
grouping_variable2(23+1:23*2) = 'Test';
grouping_variable2 = categorical(grouping_variable2);
tbl1 = table(animal_ID,grouping_variable1,grouping_variable2,dependent_variable);
lme = fitlme(tbl1,'dependent_variable ~ grouping_variable1*grouping_variable2 + (1|ID)')
lme =
Linear mixed-effects model fit by ML
Model information:
Number of observations 46
Fixed effects coefficients 8
Random effects coefficients 23
Covariance parameters 2
Formula:
dependent_variable ~ 1 + grouping_variable1*grouping_variable2 + (1 | ID)
Model fit statistics:
AIC BIC LogLikelihood Deviance
-28.98 -10.693 24.49 -48.98
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower
{'(Intercept)' } 0.46417 0.058778 7.8969 38 1.557e-09 0.34518
{'grouping_variable1_2' } 0.16375 0.092937 1.7619 38 0.086122 -0.024391
{'grouping_variable1_3' } 0.023056 0.083125 0.27736 38 0.78301 -0.14522
{'grouping_variable1_4' } 0.20131 0.080102 2.5132 38 0.016325 0.039152
{'grouping_variable2_Training' } 0.020833 0.073085 0.28506 38 0.77715 -0.12712
{'grouping_variable1_2:grouping_variable2_Training'} -0.19375 0.11556 -1.6767 38 0.10182 -0.42768
{'grouping_variable1_3:grouping_variable2_Training' } -0.060833 0.10336 -0.58857 38 0.55963 -0.27007
{'grouping_variable1_4:grouping_variable2_Training'} -0.20012 0.099597 -2.0093 38 0.051649 -0.40174
Upper
0.58316
0.35189
0.19133
0.36347
0.16879
0.040182
0.1484
0.0015053
Random effects covariance parameters (95% CIs):
Group: ID (23 Levels)
Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std'} 0.068596 0.027252 0.17266
Group: Error
Name Estimate Lower Upper
{'Res Std'} 0.12659 0.094816 0.169
results = anova(lme)
results =
ANOVA marginal tests: DFMethod = 'Residual'
Term FStat DF1 DF2 pValue
{'(Intercept)' } 62.361 1 38 1.557e-09
{'grouping_variable1' } 2.9461 3 38 0.045032
{'grouping_variable2' } 0.081258 1 38 0.77715
{'grouping_variable1:grouping_variable2'} 1.789 3 38 0.16568
Option 2
grouping_variable4 = string(zeros(23*2,1));
grouping_variable4(1:23) = 'ATraining';
grouping_variable4(23+1:23*2) = 'BTest';
grouping_variable4 = categorical(grouping_variable4);
tbl1 = table(animal_ID,grouping_variable1,grouping_variable4,dependent_variable);
lme = fitlme(tbl1,'dependent_variable ~ grouping_variable1*grouping_variable4 + (1|ID)')
lme =
Linear mixed-effects model fit by ML
Model information:
Number of observations 46
Fixed effects coefficients 8
Random effects coefficients 23
Covariance parameters 2
Formula:
dependent_variable ~ 1 + grouping_variable1*grouping_variable4 + (1 | ID)
Model fit statistics:
AIC BIC LogLikelihood Deviance
-28.98 -10.693 24.49 -48.98
Fixed effects coefficients (95% CIs):
Name Estimate SE tStat DF pValue Lower Upper
{'(Intercept)' } 0.485 0.058778 8.2513 38 5.3474e-10 0.36601 0.60399
{'grouping_variable1_2' } -0.03 0.092937 -0.3228 38 0.74862 -0.21814 0.15814
{'grouping_variable1_3' } -0.037778 0.083125 -0.45447 38 0.65208 -0.20606 0.1305
{'grouping_variable1_4' } 0.0011905 0.080102 0.014862 38 0.98822 -0.16097 0.16335
{'grouping_variable4_BTest' } -0.020833 0.073085 -0.28506 38 0.77715 -0.16879 0.12712
{'grouping_variable1_2:grouping_variable4_BTest'} 0.19375 0.11556 1.6767 38 0.10182 -0.040182 0.42768
{'grouping_variable1_3:grouping_variable4_BTest' } 0.060833 0.10336 0.58857 38 0.55963 -0.1484 0.27007
{'grouping_variable1_4:grouping_variable4_BTest'} 0.20012 0.099597 2.0093 38 0.051649 -0.0015053 0.40174
Random effects covariance parameters (95% CIs):
Group: ID (23 Levels)
Name1 Name2 Type Estimate Lower Upper
{'(Intercept)'} {'(Intercept)'} {'std'} 0.068596 0.027252 0.17266
Group: Error
Name Estimate Lower Upper
{'Res Std'} 0.12659 0.094816 0.169
results = anova(lme)
results =
ANOVA marginal tests: DFMethod = 'Residual'
Term FStat DF1 DF2 pValue
{'(Intercept)' } 68.084 1 38 5.3474e-10
{'grouping_variable1' } 0.11571 3 38 0.95036
{'grouping_variable4' } 0.081258 1 38 0.77715
{'grouping_variable1:grouping_variable4'} 1.789 3 38 0.16568
The p value of grouping_variable1 is different in the ANOVA results, depending of the other variable. I am not sure of what I may be doing wrong, or why the results would be different because of this, so any help or guidance would be greatly appreciated.
답변 (1개)
Darshak
2025년 4월 28일
편집: Darshak
2025년 4월 29일
Hello Maria,
This behavior with “fitlme” is a known thing when working with categorical variables in MATLAB. It happens because of how MATLAB internally handles categorical predictors. When you fit a linear mixed-effects model like:
lme = fitlme(tbl1, 'dependent_variable ~ grouping_variable1*grouping_variable2 + (1|ID)');
and “grouping_variable2 ” is a categorical variable, MATLAB automatically picks one of the categories as a reference level. By default, it picks the one that comes first alphabetically.
The important part is:
- The reference level affects how the model interprets both main effects and interaction terms.
- If the order of categories is changed (like renaming “Training” and 'Test' to “ATraining” and “BTest”), you're effectively changing the reference category.
- That means the model's coefficients, p-values, and ANOVA results can shift, even though the overall fit (AIC, BIC, LogLikelihood) stays identical.
It’s just the way one-hot encoding and contrast settings work.
To avoid getting different results depending on naming or alphabetical order, it's a good idea to explicitly set the reference category for categorical variables.
In MATLAB, this can be done with “reordercats”:
grouping_variable2 = reordercats(grouping_variable2, {'Test', 'Training'});
Or
“setcats”:
grouping_variable2 = setcats(grouping_variable2, {'Test', 'Training'});
The following documentations can be referred for these functions:
The following documentation can be used to refer to ‘fitlme’:
Hope this helps with the issue!
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Analysis of Variance and Covariance에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!