I'm trying to compute both the mean and SD of a set of values by group, and I can either do it with for-loop:
M = zeros(1,ngroup);
SD = zeros(1,ngroup);
for i = 1:ngroup
M(i) = mean(data(ind==i));
SD(i) = std(data(ind==i));
end
Or, alternatively use `accumarray` twice.
M = accumarray(ind,data,[],@mean);
SD = accumarray(ind,data,[],@std);
But is there a way to just use accumarray once and compute both quantities? Since accumarray is faster than for-loop, but calling it twice will be slow. Is it possible to do something like:
[M, SD] = accumarray(ind,data,[],{@mean,@std})

 채택된 답변

Matt J
Matt J 2022년 10월 19일
편집: Matt J 2022년 10월 19일

1 개 추천

Since accumarray is faster than for-loop, but calling it twice will be slow.
I would argue that 3 calls to accumarray would be the most optimal.
data = rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.499977 seconds.
tic;
Out = accumarray(ind, data, [], @(x){{mean(x) std(x)}});
toc;
Elapsed time is 0.235066 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.047581 seconds.

댓글 수: 4

BRIAN XU
BRIAN XU 2022년 10월 19일
Wow that's interesting, many thanks!
Bruno Luong
Bruno Luong 2022년 10월 19일
Accumarray is optimized when doing sum.
Using function handle has great penalty.
Matt J
Matt J 2022년 10월 19일
편집: Matt J 2022년 10월 19일
I should mention though that there might be some sacrifice in numerical accuracy using the expansion,
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
The subtraction of S2 and 2*S.*M can give large floating point residuals.
data = 10000+rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.574901 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.048677 seconds.
errorMean=norm(M-M0)/norm(M0)
errorMean = 4.4567e-15
errorSTD=norm(SD-SD0)/norm(SD0)
errorSTD = 5.3453e-06
Small simplification
N=accumarray(ind,1);
S = accumarray(ind,data);
M = S./N;
S2=accumarray(ind,data.^2);
SD = sqrt((S2 - S.*M)./(N-1));

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Star Strider
Star Strider 2022년 10월 19일

2 개 추천

The accumarray approach is certainly possible —
v = randn(250,1)
v = 250×1
0.0501 -0.2469 1.0253 1.1484 -0.9196 0.4947 -0.7221 1.2164 -0.3162 0.0461
Out = accumarray(ones(size(v)), v, [], @(x){{mean(x) std(x)}})
Out = 1×1 cell array
{1×2 cell}
meanv = Out{:}{1}
meanv = -0.0333
stdv = Out{:}{2}
stdv = 0.9419
Make appropriate changes to work with your data.
.

댓글 수: 2

BRIAN XU
BRIAN XU 2022년 10월 19일
that's a nice idea! thank you!
Star Strider
Star Strider 2022년 10월 19일
My pleasure!

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Performance and Memory에 대해 자세히 알아보기

질문:

2022년 10월 19일

댓글:

2022년 10월 19일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by