Fast method count unique variables

Question

Matthew Hickson 2017년 5월 3일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/338561-fast-method-count-unique-variables

댓글: David Goodmanson 2017년 5월 8일

Ok so given an array of (n,m) dimensions populated with positive integers from 1:m and each row of the array will contain only one of each integer: so no repeats.

For example:

2,1,3,4

3,1,2,4

1,2,3,4

2,1,4,3

I'd like a fast method of creating arrays containing all the positions that the numbers fall into, for example:

One.positions = [1,2]

Two.positions = [1,2,3]

Three.postions = [1,3,4]

Four.positions = [3,4]

I'm doing this for arrays sometimes of sizes = [1e+6,30]; currently I'm using a loop moving column wise with accummarray and if the number exists in that Column it is iteratively added to an array.

Original Code:

% m = array containing sequences of integers
total = size(m,2)
A = (1:total);
intel_pos = cell(total,1);
for i = 1:total
  for j = 1:total
      if sum(A(accumarray(m(:,j),1) > 0) == i) == 1
          intel_pos{i} = [intel_pos{i};j];
      end
  end
end

So obvious areas of improvement can be done.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

David Goodmanson 2017년 5월 3일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/338561-fast-method-count-unique-variables#answer_265532

편집: David Goodmanson 2017년 5월 5일

MATLAB Online에서 열기

Revised answer. For a 7e6 x 30 matrix this takes about 2.7 sec on my PC.

m8 = uint8(m);
u1 = uint8(1);
ncol = size(m8,2);
nmax = ncol;            % in your case. otherwise nmax = max(max(m))
A = zeros(nmax,ncol);
locations = cell(nmax,1);
% create matrix with j,k element = 1 if integer j is found in column k, 0 otherwise
for k = 1:ncol
  A(m8(:,k),k) = u1;
end
for k = 1:nmax
  locations{k} = find(A(k,:));
end

based on the principle that if you set an element to 1, it doesn't matter if you have set it to 1 a thousand times already.

댓글 수: 2
없음 표시없음 숨기기

Matthew Hickson 2017년 5월 8일

Yeah this is now the fastest answer, I improved it by about 5% by using logical indexing instead of find, and this speed is pretty darn fast so I'm going to leave this answer as accepted. It was twice as fast as my improved version of Sean's method so this is definitely acceptable, thank you

David Goodmanson 2017년 5월 8일

You're welcome, it was a good problem. I realized that I forgot to set A equal to a uint8 matrix so it is inconsistent to use u1 instead of 1 in the for loop. Actually it does not seem to make much difference speedwise if A is double and its elements are set 1 or if A is uint8 and its elements are set to u1.

댓글을 달려면 로그인하십시오.

Answer 2

Guillaume 2017년 5월 3일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/338561-fast-method-count-unique-variables#answer_265506

MATLAB Online에서 열기

No idea if it's faster than your method, this does not need a loop:

m = [2 1 3 4;1 2 3 4;1 2 4 3;2 1 4 3];  %demo data
colindices = repmat(1:size(m, 2), size(m, 1), 1);
out = accumarray(m(:), colindices(:), [], @(x) {unique(x)})

댓글 수: 2
없음 표시없음 숨기기

Sean de Wolski 2017년 5월 3일

Loops are not typically slower. This was true 15 years ago but with advances in the MATLAB execution engine and jit accelerator they are now at parity with vectorized operations much of the time.

Matthew Hickson 2017년 5월 3일

편집: Matthew Hickson 2017년 5월 3일

That may be true and don't get me wrong I use loops all the time but the real hindrance to this was unique(x) which is a slow function. This is the fastest suggestion made that actually completes the task though, but I feel like further improvements could be made and with less memory use.

Replacing {unique(x)} with {A(ismember(1:total,x))} where A = 1:m is faster

댓글을 달려면 로그인하십시오.

Answer 3

Sean de Wolski 2017년 5월 3일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/338561-fast-method-count-unique-variables#answer_265512

편집: Sean de Wolski 2017년 5월 4일

MATLAB Online에서 열기

I'd suspect a simple loop over columns with ismember would be very fast.

EDIT from Comment Clarification

tic
ncol = size(m,2);
nrow = size(m,1);
present = cell(ncol,1);
for ii = 1:ncol
    present{ii} = unique(ceil(find(m==ii)./nrow));
end
toc

OLD

[~,m] = sort(rand(1e6,30),2);
tic
ncol = size(m,2)
present = cell(ncol,1);
for ii = 1:ncol
    present{ii} = find(ismember(1:ncol,m(:,ii)));
end 
toc

This is taking 0.9s on my laptop.

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Matthew Hickson 2017년 5월 4일

편집: Matthew Hickson 2017년 5월 4일

MATLAB Online에서 열기

Examining your EDIT, your code successfully satisfies the problem, using both find and especially unique is slow, so I will remove them and see if its faster than the above mentioned answer using accumarray with no loop. At the present moment though it is slower by about 20% for arrays of size 20000 x 20:

I managed to improve the speed by about 2 times using:

A = (1:size(m,2));
for ii = 1:ncol
present{ii} = A(sum((m==ii),1) > 0);
end

For an array 6684672 x 30 in size;

-

The time of my improved version of your method is: 5.425416 seconds.

The time of my improved version of the repmat method in another answer is: 8.857116 seconds.

-

So now I assume sum is optimised so theres no faster way to do that.

Good news is this method is now the fastest suggested and uses considerably less memory than repmat. Any suggestions on further improvements?

David Goodmanson 2017년 5월 5일

Hi Matthew, I have done a revised answer, which is about three times faster than the code listed above.

댓글을 달려면 로그인하십시오.

Fast method count unique variables

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Fast method count unique variables

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기