Speed performance: Find all y-vector entries that have the same value in an x-vector of equal length.

조회 수: 1 (최근 30일)

Hi Guys,

imagine you have an x-vector

   x = [1 1 2 2 1 1];

as well as an y-vector of equal length.

    y = [1 2 3 4 5 6]

And you want to find out, which values the y-vector has at those indices where the x-vector has the value 1, as well as the y-vector values at those indices where the x-vector has the value 2. So that the solution for this example would be

solution = {[1 2 5 6], [3 4]} 

That is the first cell of 'solution' contains all y-values, that belong to the first unique x-value (1) & the second cell of it contains all y-values, that belong to the second unique x-value (2).

My ideas on how to implement this for vectors of any length are given below.

Does anybody has an idea, how to avoid the for-loop in approach A???

x = randi(10, 1, 100000);
y = randi(5, 1, 100000);
% Approach A:
tic 
xVals = unique(x); % get unique x values
tf = x == xVals';  % get true-or-false logical array (each row specifies where the n-th value of xVals occurs in x).
yVals = cell(1, size(tf,1)); % preallocate yVals
for i = 1:size(tf,1)
    yVals(i) = {y(tf(i,:))}; % Assign all y-values that belong to one xVal-entry to one cell.
end
toc
% Approach B: 
tic
[xs, id] = sort(x); % xsorted
ys = y(id); % sort y the same way x was sorted
[xVals, ia] = unique(xs);
% Divide the ys-vector into sections whose corresponding xs-indices have the same value in the xs vector.
yVals = mat2cell(ys, [1], [diff(ia)', length(ys)-sum(diff(ia))]); 
toc
% Runtimes:
%  x = randi(10, 1, 1000000);
%  y = randi(5, 1, 1000000);
%  A: Elapsed time is 0.100635 seconds.
%  B: Elapsed time is 0.149284 seconds. 
%  x = randi(100, 1, 1000000);
%  y = randi(5, 1, 1000000);
%  A: Elapsed time is 0.886396 seconds.
%  B: Elapsed time is 0.139918 seconds.
% Comparison
% A is faster than B if there are few different x-values (about 10-20). 
% The number of different x-values has high impact on speed of A.
% A is faster than B if x & y-vector are extremely long (> 100000).
% The vector length has low impact on speed of B.
  댓글 수: 7
Star Strider
Star Strider 2018년 4월 20일

My pleasure!

I was surprised that accumarray was slower than the loop, even though it makes for neater code.

I didn’t test ‘Approach B’, since ‘Approach A’ (chosen randomly) was significantly faster than my accumarray call.

Zwithouta
Zwithouta 2018년 4월 20일
편집: Zwithouta 2018년 4월 20일

Enjoy it with caution! Approach A really suffers terribly from increasing the number of unique-values in the x-vector, since it increases the number of needed for-loop iterations. Approach B in contrast shows a really stable performance.

y = randi(5, 1, 1000000);
x = randi(10, 1, 1000000);
Elapsed time is 0.108344 seconds. % A
Elapsed time is 0.195844 seconds. % B
x = randi(100, 1, 1000000);
Elapsed time is 0.877760 seconds. % A
Elapsed time is 0.182726 seconds. % B
x = randi(1000, 1, 1000000);
Elapsed time is 11.379839 seconds.% A
Elapsed time is 0.196772 seconds. % B

댓글을 달려면 로그인하십시오.

답변 (1개)

Paul Shoemaker
Paul Shoemaker 2018년 4월 20일

If you're trying to avoid a FOR loop you can use arrayfun, although I'm not sure it improves speed.

It would look something like this:

x = [1 1 2 2 1 1];
y = [1 2 3 4 5 6];
solution = arrayfun(@(z) y(z==x),unique(x),'uniformoutput',false);

Paul Shoemaker

http://www.matlabinvesting.com

  댓글 수: 1
Zwithouta
Zwithouta 2018년 4월 20일
Thanks for sharing your thoughts on that topic!
Your approach is the fastest for long x-vectors so far, but suffers similar to approach A from high numbers of unique x-values.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Graphics Object Programming에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by