How to randomly select the datapoints in a vector based on percentage for each group?

조회 수: 3 (최근 30일)
Divide the data values into groups using the percent distributions of the data values: Group 1=25%, Group 2=30%, Group 3=20%, Group 4=25%.

답변 (3개)

DGM
DGM 2024년 12월 15일
There has to be a smarter way than this, but I guess this is one idea.
% some fake data
x = randn(1000,1);
% the percentiles (should be a unit sum)
prct = cumsum([0.25 0.30 0.20 0.25])
prct = 1×4
0.2500 0.5500 0.7500 1.0000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
% where they lie in data units
pval = prctile(x,prct*100)
pval = 1×4
-0.7367 0.1090 0.6688 3.1929
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
% bin the data
nbins = numel(prct);
xbinned = cell(nbins,1);
for k = 1:nbins
switch k
case 1
mask = x <= pval(k);
case nbins
mask = x > pval(k-1);
otherwise
mask = x > pval(k-1) & x <= pval(k);
end
xbinned{k} = x(mask);
end
xbinned
xbinned = 4x1 cell array
{250x1 double} {300x1 double} {200x1 double} {250x1 double}
... that's assuming I understand the question correctly.
  댓글 수: 1
DGM
DGM 2024년 12월 16일
... I just realized the question said "randomly", so I probably completely misinterpreted the question.

댓글을 달려면 로그인하십시오.


Star Strider
Star Strider 2024년 12월 15일
I’m not certain iif you want to apportion them as they exist in the original vector, or if you want to apportion them by ascending value (essentially their percentile ranks).
Here are two methods of apportioning them —
x = randn(153,1);
L = numel(x);
gv = [25 30 20 25];
g1 = round(gv*L/100);
xg1 = mat2cell(x, g1, size(x,2)) % Option #1: Apportion Without Sorting
xg1 = 4x1 cell array
{38x1 double} {46x1 double} {31x1 double} {38x1 double}
[xs, sidx] = sort(x); % Sort Ascending
xg2idx = mat2cell(sidx, g1, size(x,2)) % Collect Sort Indices
xg2idx = 4x1 cell array
{38x1 double} {46x1 double} {31x1 double} {38x1 double}
xg2 = cellfun(@(g)x(g), xg2idx, 'Unif',0) % Option #2: ‘x’ Apportioned By ‘sort’ Indices
xg2 = 4x1 cell array
{38x1 double} {46x1 double} {31x1 double} {38x1 double}
The first ooption just apportionns them as they exist in the original vector. Tthe second apportions them essentially by their percentile ranks in the vector by first apportioning the indices produced by the sort function.
I tried this with different lengths for ‘x’ and it appears to be robust. Obviiously there is a lower limit to the number of elements in ‘x’ that would probably crash it, however I didn’t do that experiment.
.

Walter Roberson
Walter Roberson 2024년 12월 16일
Perhaps use randsample
n = 4;
k = number of samples to generate
w = [0.25, 0.30, 0.20, 0.25];
y = randsample(n,k,true,w)

카테고리

Help CenterFile Exchange에서 Shifting and Sorting Matrices에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by