I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix. I came up with two options for that:
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
tic;
% First option
counts = zeros(numel(num),1);
for i = 1:numel(num)
counts(num(i)) = counts(num(i)) + 1;
end
toc
tic;
% Second option
uni = unique(num);
uni = reshape(uni,[],1);
hc = histcounts(num,[uni;uni(end)]);
toc
Execution times are:
Elapsed time is 0.098540 seconds.
Elapsed time is 0.342214 seconds.
So option 1 is clearly faster. However the for loop bugs me. Is there any possibility to vectorize this?

댓글 수: 3

Jan Siegmund
Jan Siegmund 2020년 10월 17일
편집: Jan Siegmund 2020년 10월 17일
Note: The reshape is because unique always returns a column vector, but if it receives a row vector as input, is produces a row vector and the concat fails... gotta love that inconsistency
Adam Danz
Adam Danz 2020년 10월 18일
"I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix"
I'm a bit lost. Your matrix contains integers between 1 and 1000 and has 6000000 values (3000x2000). So, why are you looking for 6000000 different values when you only have a max of 1000 values?
Sorry, num was a stupid example. A more suitable would be
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
%...

댓글을 달려면 로그인하십시오.

 채택된 답변

Matt J
Matt J 2020년 10월 18일
편집: Matt J 2020년 10월 18일

1 개 추천

In this situation, accumarray will be faster than histcounts, but still not as fast as the for-loop,
tic;
hc=accumarray(num(:),1,[mx,1]).';
toc
Elapsed time is 0.172890 seconds. %for-loop
Elapsed time is 0.236013 seconds. %accumarray
unless the values are pre-sorted,
num=sort(num(:));
tic;
hc=accumarray(num(:),1,[mx,1]).';
toc
Elapsed time is 0.168976 seconds. %for-loop
Elapsed time is 0.075965 seconds. %accumarray
I think this is simply one of those situations where Matlab's for-loop optimization has caught up to vectorized code.

댓글 수: 1

Jan Siegmund
Jan Siegmund 2020년 10월 18일
Alright, accumarray is also a great choice. Thank you for your time and effort. This is the answer I am going to accept.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Matt J
Matt J 2020년 10월 18일
편집: Matt J 2020년 10월 18일

0 개 추천

I want to efficiently count the number of occurences of numbers between 1-numel(num) in a Matrix
If that's really what you want, then
hc = histcounts(num(:), 1:numel(num)+1 );
but as Adam points out, it would make more sense to have
hc = histcounts(num(:), 1:1001 );

댓글 수: 3

Jan Siegmund
Jan Siegmund 2020년 10월 18일
편집: Jan Siegmund 2020년 10월 18일
Yea, I wanted 1-numel(num). Sorry for the bad example.
Your proposal is great!
So I updated my example.
sz = [3000 2000];
mx = prod(sz);
num = randi([1 mx],sz);
tic;
counts = zeros(numel(num),1);
for i = 1:numel(num)
counts(num(i)) = counts(num(i)) + 1;
end
toc
tic;
hc = histcounts(num(:),1:mx+1);
toc
But histcounts is still slower than the subscripted increment:
Elapsed time is 0.296456 seconds.
Elapsed time is 0.984777 seconds.
I think this is due to the tons of branching done in histcounts.m . Is there any more efficient way?
Ok no, the branching is not the problem. Calling
matlab.internal.math.histcounts
directly only results in a minor improvement.
Matt J
Matt J 2020년 10월 18일
편집: Matt J 2020년 10월 18일
I think the for-loop is the fastest for integer data ranges that large. Unfortunately (and strangely), histcounts cannot innately recognize that the data consists only of integers and use a simpler binning method for that case. There is an input option 'BinMethod'='integers' that is offered, however, it will not permit more than 65536 bins.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Introduction to Installation and Licensing에 대해 자세히 알아보기

제품

질문:

2020년 10월 17일

편집:

2020년 10월 18일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by