How can I calculate the Variance Inflation Factor (VIF) for a linear regression model

I have a long set of linear models developed from a space-filling DOE I ran. Many of the parameters in the DOE are correlated to different degrees and I'm interested in calculating the VIF of each parameter, and recalculating the model if the VIF for some parameters is high. I saw this code in another answer https://www.mathworks.com/matlabcentral/answers/1964984-tolerance-value-and-variance-inflation-factor-in-stepwiselm
% Define the predictors and response variable
X = [x1, x2, x3, x4, x5];
Y = y;
% Define the options for stepwise regression
options = statset('Display','iter', 'TolFun', 0.01, 'TolTypeFun', 'rel', 'PEnter', 0.05, 'PRemove', 0.1);
% Run the stepwise regression with predefined multicollinearity threshold settings
mdl = stepwiselm(X, Y, 'Criterion', 'bic', 'Upper', 'linear', 'Lower', 'constant', 'PEnter', 0.05, 'PRemove', 0.1, 'Verbose', 1, 'Options', options);
% Check the multicollinearity
[vif, tolerance] = vif(mdl);
% Check for variable inclusion
included_vars = mdl.predictorNames(mdl.Coefficients.Estimate ~= 0);
However, I ran into a couple problems when trying to execute this snippet.
1) statset doesn't take 'PEnter' or 'PRemove' as arguments. Removing those two arguments allowed that line to run.
2) stepwiselm doesn't seem to take 'Options' as a valid argument
3) I can't seem to find a function called vif that takes a linear model object as an argument
I have the curve fitting, data acquisition, deep learning, and statistics toolboxes installed. If there is another toolbox with applicable functions I can install that as well.

 채택된 답변

Umar
Umar 2026년 1월 11일
편집: Umar 2026년 1월 11일

Hi @Ojaswi,

I looked into the MATLAB Central thread you shared about calculating VIF for your DOE models. The issues you ran into with that code snippet are completely legitimate, and dpb's response confirms what's actually going on. The original code was fundamentally broken because it tried to use statset with parameters that don't belong to it, attempted to pass an Options object that stepwiselm doesn't accept, and referenced a vif function that doesn't actually exist in MATLAB's toolbox. Here's what you should actually do to solve your problem. First, calculate VIF for your correlated parameters before building models by either downloading Daniel Vasilaky's vif function from MATLAB File Exchange or calculating it manually with these two lines: R equals corrcoef of your predictor matrix X, then VIF equals diag of inv of R transposed. Second, examine which predictors have VIF values above 10, as these indicate serious multicollinearity, and consider removing them or combining correlated variables. Third, if you still want to use stepwise regression after addressing multicollinearity, call stepwiselm correctly by passing all parameters directly without using statset, like this: mdl equals stepwiselm with X, Y, then Criterion bic, Upper linear, Lower constant, PEnter 0.05, PRemove 0.1, and Verbose 1 as name-value pairs. Fourth, after getting your model, you can recalculate VIF on the remaining predictors to verify the multicollinearity issue is resolved. Since you mentioned working with space-filling DOE data where many parameters are correlated to different degrees, you might want to run VIF checks iteratively, removing the highest VIF predictor each time and recalculating until all values are acceptable. The threshold of 10 is standard, but some researchers use 5 for more conservative screening. This approach will help you identify which of your DOE parameters are causing collinearity problems and build more reliable linear models. You can find the File Exchange vif function here: https://www.mathworks.com/matlabcentral/fileexchange/60551-vif-x

댓글 수: 1

I appreciate everyone's help with this - because I'm fitting so many models >100, with quadratic and interaction terms - I believe calculating the multicollinearity for the quadratic and interaction terms isn't valuable, and would require exceptional computation time to do individually for each model. I believe I can calculate the VIF for each combination of linear parameters and then use that to select the case with the lowest multicollinearity but largest selection of terms. The following is the solution I landed on. It could probably be written a little more efficiently and I don't really need 1000 valid combinations much less the number I actually get (the loop doesn't break out). But this code helped me decide on a set of variables to use that matched my intuition.
%InputTable = mdl.Variables;
InputTable; %The table used to fit the linear model. Variable names are model parameters and the last column is the response variable
n = [1:width(InputTable)-1];
numel = max(n); %number of variables
numCom = 0; %number of possible combinations of variables
combi = cell(width(InputTable)-1,1); %cell array storing all combinations of variables
for k=1:width(InputTable)-1
numCom = numCom + nchoosek(numel,k);
combi(k) = {nchoosek(n,k)};
end
%calculate the VIF for each combination of variables
vifs = cell(size(combi));
for i=1:numel
temp = combi{i};
tempVifs = zeros(size(temp));
for j=1:height(temp)
R0 = corrcoef(InputTable{:,temp(j,:)});
tempVifs(j,:)=diag(inv(R0));
end
vifs(i) = {tempVifs};
end
%% check the tolerance
VifTol = 4; %VIF for every parameter must be less than or equal to this value
passes = zeros(1000,2);
k=1;
for i=numel:-1:1 %go in reverse
% temp = combi{i};
tempVifs = vifs{i};
for j=1:height(tempVifs)
check = abs(tempVifs(j,:))<=VifTol;
if sum(check)==length(check)
passes(k,:) = [i j];
k
k=k+1;
end
end
end
%% display valid options
passnum = 1;
i=passes(passnum,1);
j = passes(passnum,2);
tempVifs = vifs{i};
check = tempVifs(j,:)<=VifTol;
tab = InputTable(:,combi{i}(j,:));

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2026년 1월 10일
편집: dpb 2026년 1월 10일
That is a very peculiar response from a MathWorks staff person, indeed. stepwiselm documentation doesn't support the use of the statset object nor does statset doc from R2022b to current indicate the stepwise linear model parameters are in its repertoire(*).
However, 'PEnter' and 'PRemove' are valid parameters, so recast your code to not use statset at all, but pass all parameters directly into stepwiselm
which -all stepwiselm
/MATLAB/toolbox/stats/classreg/stepwiselm.m
There appears still to be only the one Mathworks-supplied version ; this is the same result I get with R2022b and earlier and the doc is also the same.
Might be worth a poke at official support referencing the prior post and asking about that syntax and making necessary corrections there so somebody else doesn't get blindsided later on, too.
(*) It also seems as though an enhancement request to for these to be extended would not be out of order; it would help reqgularize syntax across the family of fitting functions.

카테고리

제품

릴리스

R2024b

질문:

2026년 1월 9일

댓글:

2026년 1월 13일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by