Adding values in vector according to labels in another vector

조회 수: 7(최근 30일)
I have two arrays of same size E x 1, let's call them Values and Labels. Values have real-valued data points, while Labels have integer-valued group labels for each data point, ranging from 1 to N. In my application E could be on the order of 1e6, while N is on the order of 1e5, so there are on average ~10 data points sharing the same group label.
I would like to generate the accumulated sum of values sharing the same group label. That is, I want to generate an N x 1 vector GroupSum where GroupSum(i) = sum(Values(Labels == i)). This will be done for many instances of (Values, Labels) combinations in an outer loop, so it is important to do this quickly. Therefore, I would like to find a faster alternative to
for i = 1:N
GroupSum(i) = sum(Values(Labels == i));
end
Any suggestions would be appreciated.
Thanks in advance.
Murat

채택된 답변

Raunak Gupta
Raunak Gupta 2020년 11월 13일
Hi,
From the current implementation I assume that the labels run from 1 to N without missing any value in between. The best-case scenario I thought of is to at least traverse the Values array once and have an accumulating count for each label. This way you can save the overhead of doing logical indexing for all the Labels. Below code may help.
E = 1e6;
N = 1e5;
Labels = randi([1 N],E,1);
Values = randi([1 100],E,1);
GroupSum = zeros(1,N);
for i = 1:E
GroupSum(Labels(i)) = GroupSum(Labels(i)) + Values(i);
end
This doesn’t use (Labels == i) which can be time saving.
  댓글 수: 5
Murat Azizoglu
Murat Azizoglu 2020년 11월 14일
I see your point. No matter how the bins are formed via Labels, at the end there are E addition operations to be performed. Thus the complexity cannot be below O(E) which is what your for loop achieves.
This also explains why the matrix based approach is about 2-4x slower, there are 2.5e6 (=d N) elements in my P matrix many of them 0's. I had thought avoiding the for loops would be a big help in Matlab, but apparently not in this case.
Thank you very much for all the help.

댓글을 달려면 로그인하십시오.

추가 답변(0개)

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by