How to find duplicate values and what are they duplicates of?

Suppose I have an array
X = [2;5;1;2;2;0;0]
With unique I get the unique values and the indices of the duplicates but I also want the indices of what they are duplicates of.
[r, ir, ix] = unique(X);
r = [0;1;2;5]
ir =[7;3;5;2]
ix = [3;4;2;3;3;1;1]
But here I get indices of last occurance. What I am looking for is:
r = [0;1;2;5]
rIdx = [6,7 ; 3; 1,4,5; 2]
So returned value would have the unique values but also have the indices of where those values appear.
Is there any solution to this?
Thank you in advance.

답변 (1개)

Matt Fig
Matt Fig 2012년 10월 4일
편집: Matt Fig 2012년 10월 4일
There are many ways to do it. Here is another, just for fun:
X = [2;5;1;2;2;0;0];
Y = arrayfun(@(x) {x find(X==x)},unique(X),'Un',0);
Now Y{1}{1} is the unique element and Y{1}{2} is the indices of it's location. Y{2}{1} is the next unique element and Y{2}{2} is the indices of its location, etc. You could also leave out the unique elements and just go with:
Y = arrayfun(@(x) find(X==x),unique(X),'Un',0);
Or if you prefer to include the unique elements but not have them in a seperate cell array, this makes it so that the first element of each cell is the unique element of X:
Y = arrayfun(@(x) [x;find(X==x)],unique(X),'Un',0);
Also, my use of ARRAYFUN here may be taken as shorthand for the FOR loops that are probably faster....
You might also be interested in this question.

댓글 수: 6

I did try using arrayfun but then the result is one cell array with different size of elements ( which is my other question that I asked in other thread). My ultimate goal is to find the pair of duplicates so then I can plot their data and compare if they are actually duplicates. Is there a way to use arrayfun and get the result in matrix?
Yes, but then why not just use the FOR loop? I don't know what you mean that you want to find "the pair of duplicates so then I can plot their data and compare if they actually are duplicates." If they are found by any of the methods shown above then they are actually duplicates; there is no need to doubt...
The raw data that I am looking at has a lot of different parameters. So if I use arrayfun on say checksum of some Values
checksum1(iNew) = sum(Values);
array1= arrayfun(@(z)[z find(checksum1==z)],unique(checksum1),'Un',0 );
will give different duplicates as compared to checksum of time stamps or length of some vector:
checksum2(iNew) =sum(Dates);
array2 = arrayfun(@(z)[z find(checksum2==z)],unique(checksum2),'Un',0 ); So I really do not know which arrayfun is reliable and hence I want to plot some data for each person( I have about 200k files) that may be a duplicate. I have been working on this for a long time now but have not been able to find a solution.
anti- arrayfun() Traitor!
LOL, Sean. We have seen this discussion before!
Sim,
I would expect array1 and array2 to give different results if Values and Dates are different! This has nothing to do with the reliability of ARRAYFUN. The function does exactly what you tell it to do, which is what we mean when we want to know if a function is reliable.
Like I said, a FOR loop is probably faster, but here is one way to do it:
X = [2;5;1;2;2;0;0];
Y = arrayfun(@(x) [x,find(X==x).'],unique(X),'Un',0);
m = max(cellfun('length',Y));
m = cellfun(@(x) [x,nan(1,m-length(x))],Y,'Un',0);
m = cell2mat(m)

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Matrices and Arrays에 대해 자세히 알아보기

질문:

Sim
2012년 10월 4일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by