Regression with several dummy variables
이전 댓글 표시
I have a cell type variable with 20000 rows and 700 columns. I present here an example of the first 9 columns:
C1 C2 C3 C4 C5 C6 C7 C8 C9
A={ 0 0 0 13 16 11 17 26 12 %row 1 is irrelevant
12 0 0 1 0 0 0 0 0
13 0 0 0 1 0 0 0 0
16 0 0 0 0 1 0 0 0
18 0 0 0 0 0 1 0 0
26 0 0 1 0 0 0 0 0
41 0 0 0 0 0 0 1 0}
I am trying to perform a regression.
C1 is simply and ID code; C2 is my binary dependent variable y. C3 is a dummy variable x (the elements, 0 or 1, are numbers), whose coefficient β (and if possible standard deviation) I want to interpret. From C4 onwards I have dummy variables (here the elements, 0 or 1, are logicals) that I also want to include in my regression to control for certain effects.
I most likely should use fitlm or regress functions but I am not being successful. Can someone help me? Thank you very much.
댓글 수: 2
dpb
2014년 8월 17일
Sounds hopeless, nearly, with the number of variables, but guess you'll not know until actually try.
First, are the dummy variables coded to be independent? That is, are the large number having come from fewer variables but of different levels or are they actually all separate effects?
Maria
2014년 8월 17일
답변 (1개)
Given the response to the previous question, should be just
y=A{1}(2:end,2); % y response variable
x=A{1}{2:end,3:end}; x=[ones(size(x,1),1 x]; % predictor variables plus constant term
[b,bint,~,~,stats] = regress(y,x);
As said, all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
댓글 수: 6
Maria
2014년 8월 17일
dpb
2014년 8월 17일
There's an extra opening paren -- started w/ cell2mat but if is a single cell don't need it. Wipe the opening '(' in front of the A 's...
Maria
2014년 8월 17일
My old eyes can't tell the difference, sorry. First set is the curlies {} and the second the regular parens. That presumes size(A) is 1x1 cell like
>> A={rand(3,5)}
A =
[3x5 double]
>> y=A{1}(2:end,2)
y =
0.5497
0.9172
>>
Otherwise, need to know what
whos A
returns.
Maria
2014년 8월 18일
Yep...as suspected would be the case given the number of dummy variables, at least one column is the same as another. It'll be very difficult to find an encoding that won't lead to the problem I'd guess.
You can always try
rank(x)
to get an estimate of how many problems you have...
I repeat the final synopsis from my initial answer --
...all will depend upon what the actual design matrix X'*X looks like when it's computed (actually not computed by Matlab, but the characteristics of same are what determines the covariances, estimabilities, etc., etc., etc., which are, of course all dependent upon the codings chosen being independent.)
It's that last phrase about being independent that's the rub.
카테고리
도움말 센터 및 File Exchange에서 Linear Predictive Coding에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!