Dividing grouped data into equally sized subgroups
조회 수: 24 (최근 30일)
이전 댓글 표시
Hello community,
for my current project i need to divide the values of a table for each group (parent-group) into 10 eqaully sized subgroups. The values within a parent-group are arranged hierarchically. By creating the subgroups, the hierarchical order has to remain, so that the first subgroup of a parent-group contains the lowest 10% of the values, the second subgroup of a parent group contains the next larger 10%-stack (>10% - 20%) of the values and so on. In the following, i will give an example:
Data = array2table(rand(1000,1));
Data.Properties.VariableNames{1} = 'Value';
ParentGroups = [1:1:10]';
Data.ParentGroups = repelem(ParentGroups,100);
Data = sortrows(Data,{'ParentGroups','Value'});
My first approach was to create the deciles of each group, which separate each parent-group into 10 equally sized parts. The problem here is, that the function i used simply outputs the values of the deciles, but there is no reference to their location in the original data. For creating the deciles, I used the following code:
decileFnc = @(x) prctile(x,[0:10:100]);
deciles = varfun(decileFnc,Data,'GroupingVariables','ParentGroups','InputVariables',1);
deciles.Fun_Value = string(deciles.Fun_Value);
deciles = [deciles(:,1:2),regexp(deciles.Fun_Value, '\s+', 'split')];
doubleFnc = @(x) double(x);
deciles = [deciles(:,1:2),varfun(doubleFnc,deciles,'InputVariables',3:13)];
The last block of code is used to unstack the output of varfun, i just shared it for the sake of completeness.
I'm really hoping for some help here and appreciate every answer.
댓글 수: 2
Peter Perkins
2021년 8월 5일
Maximillian, it's not clear what you are trying to do. Are you trying to tag each of the 1000 rows in the original data with the within-group decile that its value falls in, but preserve the original order of the data?
By the way, see the splitvars function. I thnk it replaces your last chunk of code.
채택된 답변
추가 답변 (1개)
Peter Perkins
2021년 8월 6일
>> x = rand(16,1);
>> g = repmat((1:2)',8,1);
>> t = table(g,x)
t =
16×2 table
g x
_ _______
1 0.81472
2 0.90579
1 0.12699
2 0.91338
1 0.63236
2 0.09754
1 0.2785
2 0.54688
1 0.95751
2 0.96489
1 0.15761
2 0.97059
1 0.95717
2 0.48538
1 0.80028
2 0.14189
Here's a function that takes a vector and returns the quartile bin in which each value falls.
function qbin = fun(x)
q = quantile(x,[0 .25 .5 .75 1]);
qbin = discretize(x,q);
end
Apply that function to each group of rows in the table.
>> tq = rowfun(@fun, t, "GroupingVariables","g")
tq =
16×3 table
g GroupCount Var3
_ __________ ____
1 8 3
1 8 1
1 8 2
1 8 2
1 8 4
1 8 1
1 8 4
1 8 3
2 8 3
2 8 3
2 8 1
2 8 2
2 8 4
2 8 4
2 8 2
2 8 1
rowfun has returned its output table in group order. To attach the quartile bin numbers to the correct rows of the original data, use the second output of sort.
>> [~,ord] = sort(t.g);
>> t.qbin(ord) = tq.Var3
t =
16×3 table
g x qbin
_ _______ ____
1 0.81472 3
2 0.90579 3
1 0.12699 1
2 0.91338 3
1 0.63236 2
2 0.09754 1
1 0.2785 2
2 0.54688 2
1 0.95751 4
2 0.96489 4
1 0.15761 1
2 0.97059 4
1 0.95717 4
2 0.48538 2
1 0.80028 3
2 0.14189 1
To make the result easier to confirm, sort by x within group; notice that the quartile bin numbers are as expected.
sortrows(t,["g" "qbin"])
ans =
16×3 table
g x qbin
_ _______ ____
1 0.12699 1
1 0.15761 1
1 0.63236 2
1 0.2785 2
1 0.81472 3
1 0.80028 3
1 0.95751 4
1 0.95717 4
2 0.09754 1
2 0.14189 1
2 0.54688 2
2 0.48538 2
2 0.90579 3
2 0.91338 3
2 0.96489 4
2 0.97059 4
참고 항목
카테고리
Help Center 및 File Exchange에서 Dates and Time에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!