fitgmdist for loop; nested for loop, combine array of different sizes
이전 댓글 표시
Hi all,
Please help me. Please. This research student will thank you immensely.
Here's the scenario: I have two populations (main and sub). The sub population is 10% of the main population. The mean (mu) and sigma(std) of these two populations can change, but have certain predefined boundaries. Right now, I am only changing the sigma of the main population and keeping the sigma of the sub population, and the mean of both populations constant.
What I want: I want to use fitgmdist/GMModels/Gaussian distribution to be able to identify that there are indeed two components in that set of data. In other words, I am trying to see for what sigma_main values does the model ouput two components (numComponents = 2). Right now, I have a model that iterates through my sigmamain values (0.1:0.05:1) and goes through varying GMModels to find the one with the better AIC (least error) and outputs the numComponents for every sigma_main value, and then sigma_main vs numComponents is plotted. At least, that's what I think.
Here's my code:
%% define known constants
% rng(0,'twister')
n_main = 100000;
n_sub = 10000;
%sigma_main = 0.1;
sigma_sub = 0.1;
mu_main = 1;
mu_sub =3;
%% change sigma_main values and run model
sigmamain_val = 0.1:0.05:1;
outputData = zeros(length(sigmamain_val), 2); % create an output array for (length sigma_sub),numComponents
for i = 1:length(sigmamain_val)
sigmamain_pos = randi(length(sigmamain_val));
sigma_main = sigmamain_val(sigmamain_pos);
y = sigma_main.*randn(n_main,1) + mu_main; %10^6 SKBr3 cells
y2 = sigma_sub.*randn(n_sub,1) + mu_sub; %100,000 MDA MB 231
C = cat(1, y, y2);
AIC = zeros(1,4); % create an ouput array for AIC
GMModels = cell(1,4);
options = statset('MaxIter',00); % add optitons here to better/refine the model
for k = 1:4
GMModels{k} = fitgmdist(C,k);
AIC(k)= GMModels{k}.AIC;
end
[minAIC,numComponents] = min(AIC);
outputData(i,1) = sigma_main;
outputData(i,2) = numComponents;
%BestModel = GMModels{numComponents}
BestModel = GMModels{outputData(i,2)};
end
%% plot sigma main vs numComponents
figure(3)
plot(outputData(:,1),outputData(:,2))
xlabel('sigma main')
ylabel('numComponents')
Btw: the idea for this model with AIC is based off: https://www.mathworks.com/help/stats/fitgmdist.html
Here's the problem:
For starters: The code for y & y2, which are for my main and sub population respectively, only take into account the very last sigma value.
A bigger problem is that sometimes, my code picks a sigma_main value twice (i.e 0.8) and the number of components is sometimes different even though they are the same sigma_main value.
Also, I would like to ouput a "BestModel" or 1x1 gmdistrinution for every sigma_main value instead of just the last one and I don't know how to incorporate that in my for loop.
What I think: I need to go through sigma_main values in order (i.e only write "sigma_main = i" inside the for loop) so that values don't get repeated, but then my y and y2 cannot b concatenated (C = cat(1, y,y2) because my sigma_main value will be 19x1 instead of just one number.
To outut a BestModel for every sigma_main, I need to create an empty/zeroes array before but I don't know ehat type to do that for.
Please, please help me. I have stayed up two nights doing this and I really don't know how to move forward. It might be really simple, but I might be overthinking it.
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Mathematics에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!