resampling an unbalanced dataset
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi, I have a dataset which has 2 classes(churn='False.' and churn='True.'). It is unbalanced because 700 of the 5000 sample is churn='False.' Is there a way to balance that distribution? Thank you in advance.
댓글 수: 0
채택된 답변
Image Analyst
2015년 1월 3일
Throw out all but 700 items where churn = true??? Then you'd have 700 false ones and 700 true ones. If not, then tell us in more detail what "balance" means to you.
댓글 수: 3
Image Analyst
2015년 1월 3일
Uh, sure, if that's what you want. If it's in a table, you can automate it somewhat, like
% Find out which rows are true.
trueRows = find(t.churn);
% Take only the first 700:
trueRows = trueRows(1:max([length(trueRows), 700]));
% Find out which rows are false - we want to keep all those.
falseRows = find(t.churn == false);
% Combine the false and true rows into one list of indexes.
rowsToExtract = sort([falseRows, trueRows]);
% Now extract only the first 700 true, but all the false.
t = t(rowsToExtract );
or something like that. You might have to debug it some.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Type Identification에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!