Hi All,
My apologies. Yet another question on how to vectorize a calculation. What I am trying to do, is recreate a script I wrote 15 years (!) ago in IDL, in Matlab, full description of the why and how here https://doi.org/10.1080/01431160500497119.
In short: I am bootstrapping a stepwise regression model, selecting the best predictor on statistics within the bootstrap. The current script results in the sort of output I am looking for. However, running it takes over an hour for just one response factor & one dataset. Taking that I have 10 response factors, and several datasets in this one study alone, I need this to become a lot quicker so it can become part of my standard data processing.
I know vectorizing is the way to go, but being a newbe, I cannot get my head around how to handle this. Someone willing to lend me a hand? I think that the actual regression is the one to be optimized, vectorizing for the "band" loop (Is that how you would refer to it?):
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
I am not sure whether it is possible though: There are about 2000 wavebands in there. In other words: for each bootstrap iteration, 2k regressions are run. And I am running 10k iterations for each predictor.
The process it goes through (Full function in the attached file):
for thispredictor = 1:maxpredictors
for iteration = 1:iterations
stats = zeros(maxpredictors,numbands);
outdata = zeros(numbands);
train = datasample(datarange, numsamples);
for band = 1:numbands
predictorset = predictors(train, [selectedbands,band]);
responseset = response(train);
model = fitglm(predictors(train, [selectedbands,band]), response(train));
stats(thispredictor,band) = model.Rsquared.Ordinary;
end
[maxrsq,maxband] = max(stats(thispredictor,:));
bandcount(thispredictor,maxband) = bandcount(thispredictor,maxband)+1;
end
[frequency, peakband] = max(bandcount(thispredictor,:));
selectedbands = [selectedbands,peakband];
end