How to fix a singular design matrix created by LinearModel during categorical interaction expansion?

조회 수: 4 (최근 30일)
Hello all,
I am using the LinearModel class in R2012a to create a generalized linear model. I am specifying the model by way of a formula in Wilkinson notation, and I'm passing a Dataset struct with both numeric and categorical columns. The latter are indicated to LinearModel.fit() by way of the CategoricalVars option.
The problem is that one of the terms in my formula involves interactions between two categorical variables (e.g. 'A:B'), which when expanded via dummyvars internally and multiplied out, leads to the creation of columns in the design matrix that are all zeros. This of course leads to a singular design matrix.
Is there a simple way to tell MATLAB how to handle problematic categorical interactions, or at least remove them without breaking the linear model object? I'm surprised this is apparently unhandled (no warnings even), as the situation could easily come up.
many thanks

답변 (1개)

Tom Lane
Tom Lane 2013년 3월 19일
When I try an example like this I see:
Warning: Regression design matrix is rank deficient to within machine precision.
The coefficients table has coefficients fixed at zero. The step method may remove the singular terms. The anova method can help reveal which terms don't have full degrees of freedom.
Can you elaborate on what you see?
  댓글 수: 4
Tom Lane
Tom Lane 2013년 3월 19일
You can run
lm = step(lm)
to use stepwise regression to add or remove terms based on their significance. There are 'Lower' and 'Upper' options to control the set of terms considered for adding and removing. It is possible for an interaction term to be significant and singular at the same time. This can happen when there are missing factor combinations, yet the ones present represent a significant improvement over the model without the interaction term.
You can remove a term directly:
lm = removeTerms(lm,'a:b')
Giuseppe DiStefano
Giuseppe DiStefano 2013년 3월 20일
The 'lower' and 'upper' options unfortunately don't allow one to specify particular terms that may be added. For example, considering some set of interactions, but not all, at each step. Or some linear terms, e.g. {A, B, D} but not C.
For missing factor combinations, it would be great to be able to control the behavior as far as keeping or removing singular terms is concerned. For example, as x2fx accepts catlevels, so that you can indicate that the mere absence of a level in a dataset doesn't necessarily imply non-existence.
x2fx() incidentally seems to have the same categorical interaction expansion problem.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Multiple Linear Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by