How to vectorize this for maximum efficiency?

In my calculation I had to go over non-zero elements of a 3 dimensional array (label). Instead of making 3 loops over each index of the array (which would be inefficient), I ended up doing something like this:
highest_label=max(max(max(label)));
clusters_size=zeros(1,highest_label);
label_lin=nonzeros(label(:));
for q=1:length(label_lin)
clusters_size(label_lin(q))=clusters_size(label_lin(q))+1;
end
Is there a way to get rid of this loop as well, by vectorizing, and make this even more efficient? I tried stuff like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
But that doesn't give a correct result.
To further clarify what I need, I will give an example:
Say:
label_lin=[1;1;2;3;4;5;5;5];
clusters_size=zeros(1,5)
Now I want to count the different values of label_lin in clusters_size (this is what the for loop does), so that:
clusters_size=[2,1,1,1,3]
If I do something like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
I will instead get:
clusters_size=[1,1,1,1,1]
Which is obviously incorrect.

답변 (1개)

dpb
dpb 2019년 8월 31일

0 개 추천

>> histc(label_lin,unique(label_lin))
ans =
2
1
1
1
3
>>

댓글 수: 14

In modern versions of matlab, it's recommended you use histcounts over histc:
label_lin=nonzeros(label); %no need to reshape label before calling nonzeros
clusters_size = histcounts(label_lin, [unique(label_lin); Inf]);
However, the deprecated histc is simpler in this case.
Another option is to use accumarray:
[~, ~, id] = unique(nonzeros(label));
clusters_size = accumarray(id, 1);
tensorisation
tensorisation 2019년 8월 31일
편집: tensorisation 2019년 8월 31일
I'm not just looking for a way to do this without a loop, I'm looking for the most efficient way to do this. Are these methods faster than the looped one I presented? Which is the most efficient (fastest run time) method?
tensorisation
tensorisation 2019년 8월 31일
편집: tensorisation 2019년 8월 31일
When I try:
label_lin=[1;1;2;3;4;5;5;5];
clusters_size=histcounts(label_lin,unique(label_lin))
it returns:
clusters_size=[2,1,1,4]
Which is incorrect. Since histc is not recommended for usage (which is why I used histcounts), I'm assuming its not going to give an efficient/reliable result.
Also, instead of using:
unique(label_lin)
Isn't it better to just use:
1:highest_label
Where:
highest_label=max(max(max(label)));
?
For this case they seem to give the same result, and I'm assuming 1:highest_label is more efficient.
In regards to the method of using accumarray, I'm again not sure as to why is there a need to invoke unique(...) at all?
I tried:
clusters_size=accumarray(label_lin,1);
Which seems to work. It ran as fast as the for loop method I presented. I'm looking for a more efficient (faster) method.
dpb
dpb 2019년 9월 1일
편집: dpb 2019년 9월 1일
"...clusters_size=histcounts(label_lin,unique(label_lin)) ... is incorrect. Since histc is not recommended for usage (which is why I used histcounts)..."
Note in his example Guillaume used [unique(label_lin); Inf] as the binning argument with histcounts which is why I used the venerable (yet reliable and easily understandable) histc. Instead, with histcounts you have to remember to add the last artificial entry to prevent it from lumping the last elements into the same bin, entirely unintuitive and generally also the wrong result. Just because it's new doesn't mean it's improved (or even as good) as the previous version it's supposed to replace. TMW can deprecate it all they want but they can't realistically remove it because there's too many existing cases where it is being used so I'd not worrry about it and use what is the simpler.
As for whether to bin using unique or 1:max(v), that's all dependent upon what you want for an answer--the histc()/unique() combination guarantees a bin for each value in the vector with no zeros; the alternatives will have as many bins as the various logical ways to write the definition return but will have zeros unless the number of elements is identically the same as the number of elements. Neither is right or wrong, just the answer to different questions--your initial posting indicated you wanted the count of repeitions of the elements in the vector, not a count of the values in the range of 1:max(v) including those values that might not be included in the vector.
If you're so interested in performance, while it isn't probably the answer you really want, writing
highest_label=max(label(:));
will outperform the nested max() for large arrays. unique does introduce some overhead, that is true, but is probably actually the binning that you really are looking for.
accumarray is a great tool but it is essentially a loop in one line and quite frequently will be outdone in the performance arena with the explicit loop that the JIT compiler can optimize.
Only doing real-case testing for problems of your specific datasets over a large enough sample and size can accurately judge whether there's any real performance advantage of one way versus another. In general, write the most straightforward, easy to understand and debug code you possibly can and only if it is then shown to be too slow worry about optimizing--and then only after profiling has shown that the area of code you're trying to optimize actually is a bottleneck.
Yes, I did write the correct syntax for histcounts. It does say in the doc that the last bin includes both the left and right edge (unlike histc).
You never specified what the label matrix contained, so we used unique to replicate what your loop did. If that matrix is a typical labeling matrix with integer from 1 to number of labels with 0 as a background, then yes you don't need the unique. In that case, you can also simplify the histcount syntax to:
clusters_size = histcounts(label, 'binmethod', 'integers');
clusters_size = clusters_size(2:end); %first bin would be: 0 = background
As dpb said, max(label(:)) will be faster than triple max, but since you're on R2018b, you can also use:
highest_label = max(label, 'all');
which is probably even (marginally) faster.
As for getting the most speed out of the code, you'll have to test it for yourself as this will depend on the matlab versions. I would have thought that accumarray would be the fastest but possibly histc or histcounts would be. However, mathworks have improved the speed of loops in recent version, so maybe the loop may be just as fast. Since you don't need unique, the accumarray call could be:
clusters_size = accumarray(nonzeros(label), 1);
Possibly this may be faster:
clusters_size = accumarray(label(:) + 1, 1); %add 1 to so that background becomes a valid index
clusters_size = clusters_size(2:end); %remove background count
I doubt you can do faster than accumarray, even using mex since it's already implemented as machine code.
@dpb
label is an array whos values go from 1 and highest_label=max(max(max(label))) (with no missing values), which is why there is probally no need to invoke unique(...). When I used histc as you did, It ran as fast as the for loop.
In regards to max(...), I tried now doing:
a=rand(L,L,L);
tic;
a_max_1=max(max(max(a)));
toc;
tic;
a_max_2=max(a(:));
toc;
tic;
a_max_3=max(a,[],'all');
toc;
For L=512, and L=1024, and the times where about the same.
dpb
dpb 2019년 9월 1일
For such tiny arrays, most of the run time is in the function overhead and differences are highly unlikely to be "statistically significant" in proving anything about overall performance.
In essence, the 'all' argument to max will be doing the (:) operation on the input array internally--all it is is a subscripting logical change, it does not do any reallocation. While I don't have the inclination to spend time testing, my gut feeling is that it may well beat the function as then the function doesn't have to process the argument--altho the overhead is still in the newer version to parse so may be a wash...
As G notes, we used unique because that was what your Q? implied you wanted and it is a general solution to the question independent of the specific set of data in the example. If your data is such that you have the full population, then there's no reason to use it if don't need to, granted.
There's a price to be paid for the movement by TMW to these new (more capable in some ways sometimes) functions that are based on OO invocations versus the traditional MATLAB procedural code in higher overhead in both memory and often performance as well. Convenience and/or conformance to a coding style may come at a price--"there is no free lunch!"
tensorisation
tensorisation 2019년 9월 1일
편집: tensorisation 2019년 9월 1일
@Guillaume
When I tried testing for a large array, the for loop I presented still seem to beat accumarray (the histogram bin count was not efficient).
In regards to max(...), see my previous comment.
dpb
dpb 2019년 9월 1일
As noted earlier, loops often beat accumarray even though it's mex'ed it's still a tremendously flexible tool with a lot of options and abilities beyond just the simple accumulation. There's a price to be paid for hiding the complexity in a single source line.
And, yes, similar goes for the various histogram functions--they do more than just the bin counts in general so if speed is the only criterion and don't need the extra stuff they can do plus the function overhead, it's not surprising at some point the straight loop begins to win.
It's why profiling and timing is so difficult to do well -- you have to ensure you're timing something that has meaning for the actual use, not just toy trials.
In my version of matlab, on an array with 100 million elements:
>> l = randi([0, 1000], 1000, 1000, 100);
>> timeit(@() myhist(l))
ans =
0.5904791923
>> timeit(@() histc(l(:), 0:max(l, [], 'all')))
ans =
1.7744696923
>> timeit(@() histcounts(l, 'BinMethod', 'integers'))
ans =
0.5858765923
>> timeit(@() accumarray(l(:)+1, 1))
ans =
1.2688647923
where
function count = myhist(m)
m = m+1;
numvals = max(m, [], 'all');
count = zeros(1, numvals);
for idx = 1:numel(m)
count(m(idx)) = count(m(idx))+1;
end
end
the loop is the same speed as histcounts. I'm suprised that accumarray is so much slower, I thought it was better optimised, and the venerable histc is even slower. So, yes it looks like the loop or hiscounts is the fastest option. I'd use histcounts as it means less lines of code.
As for max, even with that numbers of elements, the difference is negligible so use whichever syntax you prefer.
tensorisation
tensorisation 2019년 9월 1일
편집: tensorisation 2019년 9월 1일
@dpb
An array of size [512 512 512] doesn't seem small to me. What size would you consider "large"? Especially given the fact that certain calculations are being done with it and that eventually everything is repeated many times.
For large L (L=512) in my calculation, it seems that almost all of the run time goes to other things that came before. To be more specific:
system=1*(p>=rand(L,L,L));
label=CC2periodic(bwconncomp(system,6),[0,1,1],'LabelMatrix');
I'm using the bwconncomp(...) function of Matlab to make a Hoshen–Kopelman algorithm, and then I'm also using the function CC2periodic to apply periodic boundary conditions (because sadly, bwconncomp function of Matlab doesn't seem to have that option). The function CC2periodic is taken from here.
I now tested run times for L=256 and L=512, and it seems that the run time of CC2periodic takes up almost all of the run time. bwconncomp takes time more than all the others (expect CC2periodic), but then CC2periodic takes alot more than bwconncomp (for L=256 about 16 times more, and for L=512 about 57 times more).
I'm not exactly sure what should be the theoretical efficiency of applying periodic boundary conditions to a Hoshen-Kopelman algorithm in 3D (should it be in the same order of magnitude as bwconncomp?) , but it seems that CC2periodic is not as efficient as it should be, and I'm unsure as to what to do about that. Maybe I should open a new question about that.
tensorisation
tensorisation 2019년 9월 1일
편집: tensorisation 2019년 9월 1일
@Guillaume
This pretty much seems to be in agreement with what I tested, so I'll probally stay with the loop for now. After testing run times for different parts of my calculation, I'm now trying to make the part of the calculation that takes the most run time (especially for large L) more efficiently - see my previous comment above.
dpb
dpb 2019년 9월 1일
Yeah, has often been noted that histc is not as efficient as could be--I just wish TMW had chosen to work on it instead or at least not changed binning behavior in introducing histcounts. This incessant introduction of overlapping functionality and inconsistent syntax and behavior is really a pain and source of both confusion and error.
As far as "large", a double array of 512^3 is 128 MB. Once upon a time, that would have taxed the largest machines, now "entry-level" is 4GB physical memory. The key item is that the operation is inside a looping construct. However, as you note above, that's still not a significant fraction of the time in the loop so even if it were cut to nil it wouldn't make all that much difference. The need to optimize brings along the need to find out what areas offer real gains overall, not just "peephole" optimization.
Looking at CC2periodic you could also profile it...not sure which output form you're using; notice that one branch uses accumarray and sort both...there might be some opportunity there.
Whether it would suit your need or not, noticed a C source on GitHub that might be of some use if wanted to write a mex file.
Guillaume
Guillaume 2019년 9월 1일
It's not too hard to write your own version of bwconncomp or bwlabel so you could make it wrap around the edges. Whether or not an m file implementation will gain you anything over what you're using now remains to be seen. Certainly, if it were coded as a mex you could see a gain.

댓글을 달려면 로그인하십시오.

카테고리

제품

릴리스

R2018b

질문:

2019년 8월 31일

댓글:

2019년 9월 1일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by