Multicolin​earity/Reg​ression/PC​A and choice of optimal model (2nd try)

조회 수: 7 (최근 30일)
Hi there
I have a set of results and 4 "candidate" explanatory variables. Those variables are correlated beteween each other (only two of them are not correlated with one another). What I want to figure out is wich one(s) of them is (are) the best at explaining the results.
I understand stepwise regression is screwed by the multicolinearity (I tried to run it and it all went fine until I tried to put the interactions in the mix)
I tried an ANOVA, two of them are significant, but I get NaNs when I ask about interactions.
I tried to run a PCA among all the explanatory variables but 1) i don't understand how the PCA isnt concerned with the results I am trying to explain and 2) I don't understand the results I am getting with pcacov: what do those coefficients in the matrix mean ? How am I supposed to rank the variables ?
Does it make sense ? Thank you very much ps: i also learned about the Akaike information cirterium but i am unsure how this would apply here. I hope something more simple could help me because it feels like trying to crush a fly with a bulldozer

채택된 답변

Richard Willey
Richard Willey 2012년 4월 13일
I'm attaching some code that might provide helpful
I also have a two part blog posting on this same subject that provides a bit more depth...
%%Introduction to using LASSO
% This demo explains how to start using the lasso functionality introduced
% in R2011b. It is motivated by an example in Tibshirani’s original paper
% on the lasso.
% Tibshirani, R. (1996). Regression shrinkage and selection via the lasso.
% J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).
% The data set that we’re working with in this demo is a wide
% dataset with correlated variables. This data set includes 8 different
% variables and only 20 observations. 5 out of the 8 variables have
% coefficients of zero. These variables have zero impact on the model. The
% other three variables have non negative values and impact the model
%%Clean up workspace and set random seed
clear all
clc
% Set the random number stream
rng(1981);
%%Creating data set with specific characteristics
% Create eight X variables
% The mean of each variable will be equal to zero
mu = [0 0 0 0 0 0 0 0];
% The variable are correlated with one another
% The covariance matrix is specified as
i = 1:8;
matrix = abs(bsxfun(@minus,i',i));
covariance = repmat(.5,8,8).^matrix;
% Use these parameters to generate a set of multivariate normal random numbers
X = mvnrnd(mu, covariance, 20);
% Create a hyperplane that describes Y = f(X)
Beta = [3; 1.5; 0; 0; 2; 0; 0; 0];
ds = dataset(Beta);
% Add in a noise vector
Y = X * Beta + 3 * randn(20,1);
%%Use linear regression to fit the model
b = regress(Y,X);
ds.Linear = b;
%%Use a lasso to fit the model
[B Stats] = lasso(X,Y, 'CV', 5);
disp(B)
disp(Stats)
%%Create a plot showing MSE versus lamba
lassoPlot(B, Stats, 'PlotType', 'CV')
%%Identify a reasonable set of lasso coefficients
% View the regression coefficients associated with Index1SE
ds.Lasso = B(:,Stats.Index1SE);
disp(ds)
Create a plot showing coefficient values versus L1 norm
lassoPlot(B, Stats)
Run a Simulation
% Preallocate some variables
MSE = zeros(100,1);
mse = zeros(100,1);
Coeff_Num = zeros(100,1);
Betas = zeros(8,100);
cv_Reg_MSE = zeros(1,100);
for i = 1 : 100
X = mvnrnd(mu, covariance, 20);
Y = X * Beta + randn(20,1);
[B Stats] = lasso(X,Y, 'CV', 5);
Shrink = Stats.Index1SE - ceil((Stats.Index1SE - Stats.IndexMinMSE)/2);
Betas(:,i) = B(:,Shrink) > 0;
Coeff_Num(i) = sum(B(:,Shrink) > 0);
MSE(i) = Stats.MSE(:, Shrink);
regf = @(XTRAIN, ytrain, XTEST)(XTEST*regress(ytrain,XTRAIN));
cv_Reg_MSE(i) = crossval('mse',X,Y,'predfun',regf, 'kfold', 5);
end
Number_Lasso_Coefficients = mean(Coeff_Num);
disp(Number_Lasso_Coefficients)
MSE_Ratio = median(cv_Reg_MSE)/median(MSE);
disp(MSE_Ratio)
  댓글 수: 1
laurie
laurie 2012년 4월 19일
Hi. Thank you but this it way too complicated for me. I tried some easier ways to figure out what was happening in the data... Maybe I ll come back to your method later :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Gaussian Process Regression에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by