Splitting a matrix according to there labels

조회 수: 2(최근 30일)
NotA_Programmer
NotA_Programmer 2022년 5월 10일
댓글: Jon 2022년 5월 11일
I have a matrix of (1900 x 4 double), fourth column contains labels 3, 2 and 1. I want to split this data in 20:80 ratio of A and B where A contains 20% of each labels 3,2,&1. And B contains 80% of each labels i.e. 80% of label 3, 80% of label 2 and 80% of label 1. Please help how can this be achieved.
  댓글 수: 6
dpb
dpb 2022년 5월 10일
Add splitapply or if using table rowfun to above...

댓글을 달려면 로그인하십시오.

채택된 답변

Jon
Jon 2022년 5월 10일
편집: Jon 2022년 5월 10일
This is one way to do it
% make an example data file with last column having either a "label" of 1,
% 2, or 3
data = [rand(1900,3),randi(3,[1900,1])];
% loop through labels making training and validation data sets
Aparts = cell(3,1);
Bparts = cell(3,1);
for k = 1:3
% get the indices of the rows with kth label
idx = find(data(:,4)==k);
numWithLabel = numel(idx);
idxrand = idx(randperm(numWithLabel)); % randomize the selection
% randomly put (within rounding) 80% in training, 20% in validation
numTrain = round(0.8*numWithLabel);
Aparts{k} = data(idxrand(1:numTrain),:);
Bparts{k} = data(idxrand(numTrain+1:end),:); % the rest go to validation
end
% put all of the parts in one matrix of doubles
A = cell2mat(Aparts);
B = cell2mat(Bparts);
  댓글 수: 13
Jon
Jon 2022년 5월 11일
@dpb Thanks I realize I need to get more familiar with categorical variables. From your example, and I think another one I saw recently I see that they provide some powerful capabilities.

댓글을 달려면 로그인하십시오.

추가 답변(1개)

dpb
dpb 2022년 5월 10일
편집: dpb 2022년 5월 10일
[ix,idx]=findgroups(X(:,4)); % get grouping variable on fourth column X
for i=idx.' % for each group ID (must be numeric as here)
I=I(find(ix==i)); % the indices into X for the group
N=numel(I); % how many in this group
I=I(randperm(N)); % rearrange randomly the elements of index vector
nA=floor(0.8*N); % how many to pick for A (maybe round() instead???)
iA{i}=I(1:nA); % the randomized selection for A
iB{i}=I(nA+1:end); % rest for B
end
  댓글 수: 5
NotA_Programmer
NotA_Programmer 2022년 5월 10일
-cleanest would be to copy and paste the actual code instead of retyping; then you also get indenting and comments and all... :)
Yeah, I should have done it in tha way.
Thanks @dpb for your help!

댓글을 달려면 로그인하십시오.

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by