Regression with several dummy variables

조회 수: 2 (최근 30일)
Maria
Maria 2014년 8월 17일
편집: dpb 2014년 8월 18일
I have a cell type variable with 20000 rows and 700 columns. I present here an example of the first 9 columns:
C1 C2 C3 C4 C5 C6 C7 C8 C9
A={ 0 0 0 13 16 11 17 26 12 %row 1 is irrelevant
12 0 0 1 0 0 0 0 0
13 0 0 0 1 0 0 0 0
16 0 0 0 0 1 0 0 0
18 0 0 0 0 0 1 0 0
26 0 0 1 0 0 0 0 0
41 0 0 0 0 0 0 1 0}
I am trying to perform a regression.
C1 is simply and ID code; C2 is my binary dependent variable y. C3 is a dummy variable x (the elements, 0 or 1, are numbers), whose coefficient β (and if possible standard deviation) I want to interpret. From C4 onwards I have dummy variables (here the elements, 0 or 1, are logicals) that I also want to include in my regression to control for certain effects.
I most likely should use fitlm or regress functions but I am not being successful. Can someone help me? Thank you very much.
  댓글 수: 2
dpb
dpb 2014년 8월 17일
Sounds hopeless, nearly, with the number of variables, but guess you'll not know until actually try.
First, are the dummy variables coded to be independent? That is, are the large number having come from fewer variables but of different levels or are they actually all separate effects?
Maria
Maria 2014년 8월 17일
The large numbers did come from fewer variables but of different levels.

댓글을 달려면 로그인하십시오.

답변 (1개)

dpb
dpb 2014년 8월 17일
편집: dpb 2014년 8월 17일
Given the response to the previous question, should be just
y=A{1}(2:end,2); % y response variable
x=A{1}{2:end,3:end}; x=[ones(size(x,1),1 x]; % predictor variables plus constant term
[b,bint,~,~,stats] = regress(y,x);
As said, all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
  댓글 수: 6
Maria
Maria 2014년 8월 18일
It gives this error: Warning: X is rank deficient to within machine precision. > In regress at 84. Do you know why? Thanks
dpb
dpb 2014년 8월 18일
편집: dpb 2014년 8월 18일
Yep...as suspected would be the case given the number of dummy variables, at least one column is the same as another. It'll be very difficult to find an encoding that won't lead to the problem I'd guess.
You can always try
rank(x)
to get an estimate of how many problems you have...
I repeat the final synopsis from my initial answer --
...all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
It's that last phrase about being independent that's the rub.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Descriptive Statistics에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by