필터 지우기
필터 지우기

How to count data based on the categories

조회 수: 5 (최근 30일)
Moe
Moe 2015년 10월 22일
편집: arich82 2015년 10월 27일
There is a Master matrix as follow that includes a unique ID (first column), different ID (second column) and last three columns (C/D/E) are included a number with a different range. (e.g. Column #3 included 1:2; column #4 included 1:2; column #5 included 1:10). I want a new matrix (like matrix OT in the following) that according to the unique ID find in the first column, count column 3 to 5 based on the different category. (e.g. ID = 1, column #3, category #1 = counted as 5, category #2 = counted as 0).
Master = [1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
Example of output for first ID #1
OT = [1 5 0 2 3 2 0 0 0 0 0 0 1 1 0];
  댓글 수: 1
Moe
Moe 2015년 10월 22일
편집: Moe 2015년 10월 22일
I guess I couldn't explain well in my initial question. My meaning of category was the variation of numbers that are available in that column. For example, in column three, there are only 2 different numbers (1 or 2), that's why when it counted, it found 5 (type 1) and 0 (type 2).
Another example, column 5, there are 10 different numbers (1 or 2 or 3 or ... or 10), that's why when it counted, it found 2 (type 1), 1 (type 8), 2 (type 9) and rest 0.

댓글을 달려면 로그인하십시오.

채택된 답변

arich82
arich82 2015년 10월 22일
편집: arich82 2015년 10월 22일
[Edit to include output]]
Without the toolboxes, I think you can do this with accumarray
data = [...
1 100680 1 2 1; ...
1 36731 1 1 9; ...
1 36731 1 2 9; ...
1 14275 1 1 1; ...
1 14275 1 2 8; ...
2 14275 2 1 7; ...
2 117633 1 1 6; ...
2 117633 1 2 6; ...
2 68599 2 1 8; ...
2 31678 1 1 1; ...
2 31678 1 1 8; ...
3 31678 2 1 2; ...
3 31678 2 2 7; ...
3 44106 2 2 10; ...
];
n = size(data, 1);
weights = ones(n, 1);
OT = unique(data(:, 1));
for k = 3:5
OT = [OT, accumarray(data(:, [1, k]), weights)];
end
output:
OT =
1 5 0 2 3 2 0 0 0 0 0 0 1 2 0
2 4 2 5 1 1 0 0 0 0 2 1 2 0 0
3 0 3 1 2 0 1 0 0 0 0 1 0 0 1
Note that I changed your data to more closely match your Excel data (Column 3 only had ones in your Master, but Column C had 1's and 2's; there needs to be at least one occurrence of the max index for this approach to match your desired result).
Also, Column 5 (E) clearly has two 9's for id == 1. Should OT(end - 1) == 2 instead of 1, or am I misinterpreting something?
  댓글 수: 3
Moe
Moe 2015년 10월 26일
Hi arich82
Can you please tell me why your code is not working for the following data:
data = [
300 84617 1 1 4 1 1 2 1
300 84617 1 2 4 1 1 2 1
300 96283 1 2 1 1 1 4 4
300 96283 1 2 1 1 1 4 4
300 96283 1 2 5 1 1 2 1
300 96283 1 1 5 1 1 2 1
301 117059 2 2 1 2 1 4 7
301 117059 2 1 1 2 1 4 4
];
It gives the following error:
Error using horzcat
CAT arguments dimensions are not
consistent.
Error in ModeChoice (line 20)
OT = [OT, accumarray(data(:, [1, k]),
weights)];
arich82
arich82 2015년 10월 27일
편집: arich82 2015년 10월 27일
In the above code, accumarray is using the id column as the row index; this was fine when the id started at 1, and used consecutive integers.
Now, OT is initialized to [300; 301], that is, the value 300 in row 1 and the vaule 301 in row 2, but accumarray is trying to put results in the row 300 and row 301.
The solution is to use the third output from unique:
n = size(data, 1);
weights = ones(n, 1);
[OT, ~, ind_id] = unique(data(:, 1));
for k = 3:5
OT = [OT, accumarray([ind_id, data(:, k)], weights)];
end
or, preallocating (slighly improved syntax)
cols = 3:5; % columns of interest, i.e. 'C', 'D', & 'E'
ncols = numel(cols);
field_widths = [1, max(data(:, cols))]; % prepend 1 for id
[u_id, ~, ind_id] = unique(data(:, 1));
OT = zeros(numel(u_id), sum(field_widths)); % preallocate
n = size(data, 1);
weights = ones(n, 1);
OT(:, 1) = u_id;
for k = 1:ncols
i0 = sum(field_widths(1:k)); % end index of previous field
inds = i0 + [1:field_widths(k+1)];
OT(:, inds) = accumarray([ind_id, data(:, cols(k))], weights);
end
output:
OT =
300 6 0 2 4 2 0 0 2 2
301 0 2 1 1 2 0 0 0 0

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Image Analyst
Image Analyst 2015년 10월 22일
If you have the Statistics and Machine Learning Toolbox, you can use grpstats():
Master = [...
1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10]
statsArray = grpstats(Master, Master(:,1), 'sum')
and in the command window you'll see
statsArray =
5 202692 5 8 28
12 381496 6 7 36
9 107462 3 5 19
Columns 3-5 in statsArray are the sums in columns 3-5 broken down by category number in column 1 of Master.
  댓글 수: 2
Moe
Moe 2015년 10월 22일
Thanks Image Analyst. Matrix format is correct but the answer is not. I'm looking for the count instead of sum. For example, for ID 2, how many is 1 and how many is 2. If you count you will find 5 number 1 and 1 number 2 while in your code sum of this is calculated.
Image Analyst
Image Analyst 2015년 10월 23일
Sorry, I didn't understand your definition/distinction between count and sum. If you want count, it seems to get the number of unique numbers. So you can just do
statsArray = grpstats(Master, Master(:,1), @fun)
with "fun" being defined as:
function num = fun(array)
num= length(unique(array));
It seems a lot simpler than the answer you chose, but whatever... I know I modified my answer after you had already picked a solution. If you want a :one-liner" you can still use it.

댓글을 달려면 로그인하십시오.


Peter Perkins
Peter Perkins 2015년 10월 23일
Another possibility, using rowfun and a table. This code:
Master = ...
[1 100680 1 2 1
1 36731 1 1 9
1 36731 1 2 9
1 14275 1 1 1
1 14275 1 2 8
2 14275 1 1 7
2 117633 1 1 6
2 117633 1 2 6
2 68599 1 1 8
2 31678 1 1 1
2 31678 1 1 8
3 31678 1 1 2
3 31678 1 2 7
3 44106 1 2 10];
M = array2table(Master,'VariableNames',{'A' 'B' 'C' 'D' 'E'});
outNames = {'C1' 'C2' 'D1' 'D2' 'E1' 'E2' 'E3' 'E4' 'E5' 'E6' 'E7' 'E8' 'E9' 'E10'};
counts = rowfun(@fun, M,'GroupingVariables','A','InputVariables',{'C' 'D' 'E'},'OutputVariableNames',outNames)
... with this function:
function [varargout] = fun(C,D,E)
counts = [histc(C',1:2) histc(D',1:2) histc(E',1:10)];
varargout = num2cell(counts);
... produces this output:
counts =
A GroupCount C1 C2 D1 D2 E1 E2 E3 E4 E5 E6 E7 E8 E9 E10
_ __________ __ __ __ __ __ __ __ __ __ __ __ __ __ ___
1 5 5 0 2 3 2 0 0 0 0 0 0 1 2 0
2 6 6 0 5 1 1 0 0 0 0 2 1 2 0 0
3 3 3 0 1 2 0 1 0 0 0 0 1 0 0 1
There are probably better ways to provide this result, for example with a table more like this:
ans =
A GroupCount Ccounts Dcounts Ecounts
_ __________ _______ _______ _____________
1 5 5 0 2 3 [1x10 double]
2 6 6 0 5 1 [1x10 double]
3 3 3 0 1 2 [1x10 double]
but the former is more or less what you seem to be asking for.

카테고리

Help CenterFile Exchange에서 Resizing and Reshaping Matrices에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by