Can the efficienty of this code be improved, either computationally or just in terms of lines of code?
조회 수: 4 (최근 30일)
이전 댓글 표시
Dumb question for a smart person who has a moment to kill.
Let's say I have data that will come in from n groups, and I know a priori those groups will be numbered 1 through n in some variable, A. I will have a second variable, B, that contains the data. Then, I want to get (for example) the mean of the data in each group. It is easy to pull off with a loop, but is there better code I could be using for this procedure? For a small example dataset, I might have
A = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
tic
%%% Can this be done better or in one line of code? %%%
C = NaN(max(A), 1);
for ii = 1:numel(C)
C(ii) = mean(B(A == ii));
end
%%% Can this be done better or in one line of code? %%%
toc
disp(C)
bar(C)
Is there a better way to do this?
댓글 수: 0
채택된 답변
Jan
2022년 12월 5일
편집: Jan
2022년 12월 5일
A0 = [2; 3; 1; 2; 2; 3; 1; 2; 2; 3];
B0 = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
A = repmat(A0, 1e6, 1); % Let Matlab work with more than tiny data
B = repmat(B0, 1e6, 1);
tic
C = NaN(max(A), 1);
for ii = 1:numel(C)
m = A == ii;
C(ii) = sum(B(A == ii));
end
toc
Shorter but slower:
tic
D = accumarray(A, B, [], @mean);
toc
isequal(C, D)
Another apporach:
tic
S = zeros(max(A), 1);
N = zeros(size(S));
for k = 1:numel(A)
m = A(k);
S(m) = S(m) + B(k);
N(m) = N(m) + 1;
end
E = S ./ N;
toc
isequal(C, E) % Not equal!!!
% But the differences are caused by rounding only:
(C - E) ./ C
The difference is caused by the numerical instability of sums. Comparing the results with the mean of A0 and B0 shows, that all methods have comparable accuracy.
Locally under R2018b I get these timings:
Elapsed time is 0.205890 seconds. % Original
Elapsed time is 0.512173 seconds. % ACCUMARRAY
Elapsed time is 0.061097 seconds. % Loop over inputs
댓글 수: 2
Torsten
2022년 12월 5일
I took your repmat modification and added Steven Lord's answer, below, and the original loop looks like the clear winner.
Or "arrayfun" (see above).
추가 답변 (1개)
Steven Lord
2022년 12월 5일
A = [2; 4; 1; 2; 2; 4; 1; 2; 2; 4];
B = [4.10047; 7.44549; 3.62159; 6.56964; 2.87221; 4.51231; 4.01697; 5.60534; 5.5440; 7.07802];
[C, groupnumbers] = groupsummary(B, A, @mean)
The groupnumbers output can help if some elements in 1:n don't appear in A (as is the case using the modified A I used in this example where all the 3's are replaced by 4's.)
참고 항목
카테고리
Help Center 및 File Exchange에서 Matrix Indexing에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!