How to vectorize this for maximum efficiency?
이전 댓글 표시
In my calculation I had to go over non-zero elements of a 3 dimensional array (label). Instead of making 3 loops over each index of the array (which would be inefficient), I ended up doing something like this:
highest_label=max(max(max(label)));
clusters_size=zeros(1,highest_label);
label_lin=nonzeros(label(:));
for q=1:length(label_lin)
clusters_size(label_lin(q))=clusters_size(label_lin(q))+1;
end
Is there a way to get rid of this loop as well, by vectorizing, and make this even more efficient? I tried stuff like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
But that doesn't give a correct result.
To further clarify what I need, I will give an example:
Say:
label_lin=[1;1;2;3;4;5;5;5];
clusters_size=zeros(1,5)
Now I want to count the different values of label_lin in clusters_size (this is what the for loop does), so that:
clusters_size=[2,1,1,1,3]
If I do something like:
clusters_size(label_lin)=clusters_size(label_lin)+1;
I will instead get:
clusters_size=[1,1,1,1,1]
Which is obviously incorrect.
답변 (1개)
dpb
2019년 8월 31일
>> histc(label_lin,unique(label_lin))
ans =
2
1
1
1
3
>>
댓글 수: 14
Guillaume
2019년 8월 31일
In modern versions of matlab, it's recommended you use histcounts over histc:
label_lin=nonzeros(label); %no need to reshape label before calling nonzeros
clusters_size = histcounts(label_lin, [unique(label_lin); Inf]);
However, the deprecated histc is simpler in this case.
Another option is to use accumarray:
[~, ~, id] = unique(nonzeros(label));
clusters_size = accumarray(id, 1);
tensorisation
2019년 8월 31일
편집: tensorisation
2019년 8월 31일
tensorisation
2019년 8월 31일
편집: tensorisation
2019년 8월 31일
"...clusters_size=histcounts(label_lin,unique(label_lin)) ... is incorrect. Since histc is not recommended for usage (which is why I used histcounts)..."
Note in his example Guillaume used [unique(label_lin); Inf] as the binning argument with histcounts which is why I used the venerable (yet reliable and easily understandable) histc. Instead, with histcounts you have to remember to add the last artificial entry to prevent it from lumping the last elements into the same bin, entirely unintuitive and generally also the wrong result. Just because it's new doesn't mean it's improved (or even as good) as the previous version it's supposed to replace. TMW can deprecate it all they want but they can't realistically remove it because there's too many existing cases where it is being used so I'd not worrry about it and use what is the simpler.
As for whether to bin using unique or 1:max(v), that's all dependent upon what you want for an answer--the histc()/unique() combination guarantees a bin for each value in the vector with no zeros; the alternatives will have as many bins as the various logical ways to write the definition return but will have zeros unless the number of elements is identically the same as the number of elements. Neither is right or wrong, just the answer to different questions--your initial posting indicated you wanted the count of repeitions of the elements in the vector, not a count of the values in the range of 1:max(v) including those values that might not be included in the vector.
If you're so interested in performance, while it isn't probably the answer you really want, writing
highest_label=max(label(:));
will outperform the nested max() for large arrays. unique does introduce some overhead, that is true, but is probably actually the binning that you really are looking for.
accumarray is a great tool but it is essentially a loop in one line and quite frequently will be outdone in the performance arena with the explicit loop that the JIT compiler can optimize.
Only doing real-case testing for problems of your specific datasets over a large enough sample and size can accurately judge whether there's any real performance advantage of one way versus another. In general, write the most straightforward, easy to understand and debug code you possibly can and only if it is then shown to be too slow worry about optimizing--and then only after profiling has shown that the area of code you're trying to optimize actually is a bottleneck.
Guillaume
2019년 9월 1일
Yes, I did write the correct syntax for histcounts. It does say in the doc that the last bin includes both the left and right edge (unlike histc).
You never specified what the label matrix contained, so we used unique to replicate what your loop did. If that matrix is a typical labeling matrix with integer from 1 to number of labels with 0 as a background, then yes you don't need the unique. In that case, you can also simplify the histcount syntax to:
clusters_size = histcounts(label, 'binmethod', 'integers');
clusters_size = clusters_size(2:end); %first bin would be: 0 = background
As dpb said, max(label(:)) will be faster than triple max, but since you're on R2018b, you can also use:
highest_label = max(label, 'all');
which is probably even (marginally) faster.
As for getting the most speed out of the code, you'll have to test it for yourself as this will depend on the matlab versions. I would have thought that accumarray would be the fastest but possibly histc or histcounts would be. However, mathworks have improved the speed of loops in recent version, so maybe the loop may be just as fast. Since you don't need unique, the accumarray call could be:
clusters_size = accumarray(nonzeros(label), 1);
Possibly this may be faster:
clusters_size = accumarray(label(:) + 1, 1); %add 1 to so that background becomes a valid index
clusters_size = clusters_size(2:end); %remove background count
I doubt you can do faster than accumarray, even using mex since it's already implemented as machine code.
tensorisation
2019년 9월 1일
dpb
2019년 9월 1일
For such tiny arrays, most of the run time is in the function overhead and differences are highly unlikely to be "statistically significant" in proving anything about overall performance.
In essence, the 'all' argument to max will be doing the (:) operation on the input array internally--all it is is a subscripting logical change, it does not do any reallocation. While I don't have the inclination to spend time testing, my gut feeling is that it may well beat the function as then the function doesn't have to process the argument--altho the overhead is still in the newer version to parse so may be a wash...
As G notes, we used unique because that was what your Q? implied you wanted and it is a general solution to the question independent of the specific set of data in the example. If your data is such that you have the full population, then there's no reason to use it if don't need to, granted.
There's a price to be paid for the movement by TMW to these new (more capable in some ways sometimes) functions that are based on OO invocations versus the traditional MATLAB procedural code in higher overhead in both memory and often performance as well. Convenience and/or conformance to a coding style may come at a price--"there is no free lunch!"
tensorisation
2019년 9월 1일
편집: tensorisation
2019년 9월 1일
dpb
2019년 9월 1일
As noted earlier, loops often beat accumarray even though it's mex'ed it's still a tremendously flexible tool with a lot of options and abilities beyond just the simple accumulation. There's a price to be paid for hiding the complexity in a single source line.
And, yes, similar goes for the various histogram functions--they do more than just the bin counts in general so if speed is the only criterion and don't need the extra stuff they can do plus the function overhead, it's not surprising at some point the straight loop begins to win.
It's why profiling and timing is so difficult to do well -- you have to ensure you're timing something that has meaning for the actual use, not just toy trials.
Guillaume
2019년 9월 1일
In my version of matlab, on an array with 100 million elements:
>> l = randi([0, 1000], 1000, 1000, 100);
>> timeit(@() myhist(l))
ans =
0.5904791923
>> timeit(@() histc(l(:), 0:max(l, [], 'all')))
ans =
1.7744696923
>> timeit(@() histcounts(l, 'BinMethod', 'integers'))
ans =
0.5858765923
>> timeit(@() accumarray(l(:)+1, 1))
ans =
1.2688647923
where
function count = myhist(m)
m = m+1;
numvals = max(m, [], 'all');
count = zeros(1, numvals);
for idx = 1:numel(m)
count(m(idx)) = count(m(idx))+1;
end
end
the loop is the same speed as histcounts. I'm suprised that accumarray is so much slower, I thought it was better optimised, and the venerable histc is even slower. So, yes it looks like the loop or hiscounts is the fastest option. I'd use histcounts as it means less lines of code.
As for max, even with that numbers of elements, the difference is negligible so use whichever syntax you prefer.
tensorisation
2019년 9월 1일
편집: tensorisation
2019년 9월 1일
tensorisation
2019년 9월 1일
편집: tensorisation
2019년 9월 1일
dpb
2019년 9월 1일
Yeah, has often been noted that histc is not as efficient as could be--I just wish TMW had chosen to work on it instead or at least not changed binning behavior in introducing histcounts. This incessant introduction of overlapping functionality and inconsistent syntax and behavior is really a pain and source of both confusion and error.
As far as "large", a double array of 512^3 is 128 MB. Once upon a time, that would have taxed the largest machines, now "entry-level" is 4GB physical memory. The key item is that the operation is inside a looping construct. However, as you note above, that's still not a significant fraction of the time in the loop so even if it were cut to nil it wouldn't make all that much difference. The need to optimize brings along the need to find out what areas offer real gains overall, not just "peephole" optimization.
Looking at CC2periodic you could also profile it...not sure which output form you're using; notice that one branch uses accumarray and sort both...there might be some opportunity there.
Whether it would suit your need or not, noticed a C source on GitHub that might be of some use if wanted to write a mex file.
Guillaume
2019년 9월 1일
It's not too hard to write your own version of bwconncomp or bwlabel so you could make it wrap around the edges. Whether or not an m file implementation will gain you anything over what you're using now remains to be seen. Certainly, if it were coded as a mex you could see a gain.
카테고리
도움말 센터 및 File Exchange에서 Surface and Mesh Plots에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!