Splitting a matrix according to there labels

조회 수: 2(최근 30일)
NotA_Programmer 2022년 5월 10일
댓글: Jon 2022년 5월 11일
I have a matrix of (1900 x 4 double), fourth column contains labels 3, 2 and 1. I want to split this data in 20:80 ratio of A and B where A contains 20% of each labels 3,2,&1. And B contains 80% of each labels i.e. 80% of label 3, 80% of label 2 and 80% of label 1. Please help how can this be achieved.
  댓글 수: 6
dpb 2022년 5월 10일
Add splitapply or if using table rowfun to above...

댓글을 달려면 로그인하십시오.

채택된 답변

Jon 2022년 5월 10일
편집: Jon 2022년 5월 10일
This is one way to do it
% make an example data file with last column having either a "label" of 1,
% 2, or 3
data = [rand(1900,3),randi(3,[1900,1])];
% loop through labels making training and validation data sets
Aparts = cell(3,1);
Bparts = cell(3,1);
for k = 1:3
% get the indices of the rows with kth label
idx = find(data(:,4)==k);
numWithLabel = numel(idx);
idxrand = idx(randperm(numWithLabel)); % randomize the selection
% randomly put (within rounding) 80% in training, 20% in validation
numTrain = round(0.8*numWithLabel);
Aparts{k} = data(idxrand(1:numTrain),:);
Bparts{k} = data(idxrand(numTrain+1:end),:); % the rest go to validation
% put all of the parts in one matrix of doubles
A = cell2mat(Aparts);
B = cell2mat(Bparts);
  댓글 수: 13
Jon 2022년 5월 11일
@dpb Thanks I realize I need to get more familiar with categorical variables. From your example, and I think another one I saw recently I see that they provide some powerful capabilities.

댓글을 달려면 로그인하십시오.

추가 답변(1개)

dpb 2022년 5월 10일
편집: dpb 2022년 5월 10일
[ix,idx]=findgroups(X(:,4)); % get grouping variable on fourth column X
for i=idx.' % for each group ID (must be numeric as here)
I=I(find(ix==i)); % the indices into X for the group
N=numel(I); % how many in this group
I=I(randperm(N)); % rearrange randomly the elements of index vector
nA=floor(0.8*N); % how many to pick for A (maybe round() instead???)
iA{i}=I(1:nA); % the randomized selection for A
iB{i}=I(nA+1:end); % rest for B
  댓글 수: 5
NotA_Programmer 2022년 5월 10일
-cleanest would be to copy and paste the actual code instead of retyping; then you also get indenting and comments and all... :)
Yeah, I should have done it in tha way.
Thanks @dpb for your help!

댓글을 달려면 로그인하십시오.




Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by