Unique function not deleting duplicate rows.

Question

0 개 추천

M.mat

attached my matrix "M" and here is my code.

[trash,idx] = unique(M,'rows');
pleb=M(idx,:)
gg=sort(pleb)

When inspecting gg we see that there are still duplicate rows.

I've also tried to do it in different ways, for example;

[~, III, ~] = unique(M,'first','rows'); %removing double points
III = sort(III);
pleb = M(III,:);
gg=sort(pleb);

But they either delete non duplicate data, or delete too few data.

What am I doing wrong?

댓글 수: 2
없음 표시 없음 숨기기

Stephen23 2015년 5월 4일

편집: Stephen23 2015년 5월 4일

"What am I doing wrong": not clicking on both buttons to attach the data: you need to first click Choose file and then Attach file. Please try attaching your data again.

luc 2015년 5월 4일

I attached it the right way now.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Stephen23 2015년 5월 4일

편집: Stephen23 2015년 5월 4일

3 개 추천

It is likely that the data are floating point and that they are not actually equal, which confuses many beginners and people not used to working with numeric data. Although what is displayed on the command window might look the same, floating point values can differ at the low end of their significand, so testing for equality (like unique does) does not work.

To understand more about this topic read these:

http://matlab.wikia.com/wiki/FAQ#Why_is_0.3_-_0.2_-_0.1_.28or_similar.29_not_equal_to_zero.3F

http://www.mathworks.com/matlabcentral/answers/41536-why-is-2-24-10-22-4-not-equal-to-0

http://www.mathworks.com/matlabcentral/answers/69-why-does-1-2-3-1-3-not-equal-zero

http://www.mathworks.com/matlabcentral/answers/57444-faq-why-is-0-3-0-2-0-1-not-equal-to-zero

Alternatively, if the data are strings, trailing spaces are often overlooked by users...

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

Stephen23 2015년 5월 4일

편집: Stephen23 2015년 5월 5일

MATLAB Online에서 열기

@luc: Your comment makes it clear that there is significant confusion about what unique and sort actually do. Reading the documentation would help you.

The function unique (when used with the option rows) removes rows that are identical. Have a look at this simple example:

>> A = [0,1,2; 1,1,1; 0,1,2; 3,3,3]
A =
     0     1     2
     1     1     1
     0     1     2
     3     3     3
>> unique(A,'rows')
ans =
     0     1     2
     1     1     1
     3     3     3

This is the correct and documented behavior of unique: it has removed the repeated instances of the row 0,1,2 because this row occurs twice in the matrix A, but the two rows which consist of repeated values are still in the output. When we run unique(M) using your matrix M we get an output matrix that is the same size as M, because none of the rows are duplicated.

The function sort, when given matrix M, sorts each column of M independently. Like this simple example shows, using the matrix used above:

>> sort(A)
ans =
     0     1     1
     0     1     2
     1     1     2
     3     3     3

Note that the rows are not the same after being sorted: each column has been sorted independently, so there is no guarantee that the output rows are the same as the input rows (and in your case they are not).

Here are a few things we have learned:

All of the rows of the matrix M are already unique, so unique(M) does nothing.
sort(M) produces rows that are not the same as the rows beforehand, so there is not reason why they should be unique either.

You might like to try using sortrows instead of sort...

>> sortrows(A)
ans =
   0     1     2
   0     1     2
   1     1     1
   3     3     3

gives quite a different output to sort, but it is still clear that none of the rows are duplicates.

And it is also easy to show that none of the rows is made up of repeated values, just in case this was what you were hoping to locate:

>> min(max(abs(diff(M,2)),[],2))
ans =
  0.5343

If any single row had only the same value repeated then this result would be zero.

To exactly define the operation you are trying to achieve you need to tell us what this is and describe the output that you need, and even give some examples...

luc 2015년 5월 4일

Thanks Stephen,

I think the sorting part had me confused.

nice explanation, I learned something.

:)

Stephen23 2015년 5월 5일

@luc: I'm glad to be able to help!

댓글을 달려면 로그인하십시오.

Answer 2

Titus Edelhofer 2015년 5월 4일

MATLAB Online에서 열기

1 개 추천

Hi Luc,

I don't see duplicate data, but the data change sign ...? Take last 4 rows of pleb and it's

4558   -4.1355   -2.0906
4558   -4.1355    2.0906
4558    4.1355   -2.0906
4558    4.1355    2.0906

Look similar but all 4 are completely different - as long as -2.0906 is different from 2.0906 ;-).

Similar for the other "4-row-blocks".

When you take the abs then the story is different,

Titus

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

Stephen23 2015년 5월 4일

편집: Stephen23 2015년 5월 4일

@luc: There is no reason why those rows would be removed, as

all rows of M are already unique
sort(M) sorts each column independently, so there is no reason why these rows should be unique (or removed) either.

You need to actually describe what you are trying to achieve.

Titus Edelhofer 2015년 5월 4일

편집: Titus Edelhofer 2015년 5월 4일

MATLAB Online에서 열기

Indeed. As I wrote as comment, if you would sort keeping rows as rows, i.e., using

sortrows(M)

then you would see, that there are no duplicate rows.

댓글을 달려면 로그인하십시오.

Answer 3

John D'Errico 2015년 5월 4일

편집: John D'Errico 2015년 5월 4일

MATLAB Online에서 열기

1 개 추천

There are NO equal rows. I checked. They are different in sign. There are no rows that are even that close to each other, although the nearest neighbor is not uniformly close.

The check that I made was to find the point for each row that was closest in distance. I.e., the nearest neighbor. There ARE no essentially zero distances.

The overall closest pair of points are 1.7291 units apart.

Mu = unique(M,'rows');
D = ipdm(Mu,'subset','smallestfew','limit',1)
D =
  (87,95)         1.7291
D = ipdm(Mu,'subset','nearest')
D =
   (2,1)          4.1811
   (1,2)          4.1811
  (13,3)          4.1811
  (14,4)          4.1811
   (6,5)          4.1811
   (5,6)          4.1811
   (8,7)          4.1811
   (7,8)          4.1811
  (15,9)          4.1811
  (16,10)         4.1811
  (17,11)         4.1811
  (18,12)         4.1811
   (3,13)         4.1811
   (4,14)         4.1811
   (9,15)         4.1811
  (10,16)         4.1811
  (11,17)         4.1811
  (12,18)         4.1811
  (26,25)         4.1811
  (25,26)         4.1811
  (28,27)         4.1811
  (27,28)         4.1811
  (35,29)         4.1811
  (36,30)         4.1811
  (37,31)         4.1811
  (38,32)         4.1811
  (19,33)         4.1811
  (20,34)         4.1811
  (29,35)         4.1811
  (30,36)         4.1811
  (31,37)         4.1811
  (32,38)         4.1811
  (21,39)         4.1811
  (22,40)         4.1811
  (23,41)         4.1811
  (24,42)         4.1811
  (33,47)         4.1811
  (53,47)         3.3826
  (55,47)         3.3826
  (34,48)         4.1811
  (54,48)         3.3826
  (56,48)         3.3826
  (43,49)         4.1811
  (44,50)         4.1811
  (45,51)         4.1811
  (46,52)         4.1811
  (39,53)         4.1811
  (47,53)         3.3826
  (40,54)         4.1811
  (48,54)         3.3826
  (41,55)         4.1811
  (42,56)         4.1811
  (58,57)         4.1811
  (57,58)         4.1811
  (60,59)         4.1811
  (59,60)         4.1811
  (69,61)         4.1811
  (70,62)         4.1811
  (71,63)         4.1811
  (72,64)         4.1811
  (49,65)         4.1811
  (91,65)         3.3826
  (50,66)         4.1811
  (92,66)         3.3826
  (51,67)         4.1811
  (93,67)         3.3826
  (52,68)         4.1811
  (94,68)         3.3826
  (61,69)         4.1811
  (62,70)         4.1811
  (63,71)         4.1811
  (64,72)         4.1811
  (79,77)         1.7291
  (81,77)         1.7291
  (80,78)         1.7291
  (82,78)         1.7291
  (77,79)         1.7291
  (78,80)         1.7291
  (73,83)         4.1811
  (74,84)         4.1811
  (75,85)         4.1811
  (76,86)         4.1811
  (95,87)         1.7291
  (96,88)         1.7291
  (97,89)         1.7291
  (98,90)         1.7291
  (65,91)         3.3826
  (83,91)         4.1811
 (105,91)         3.3826
  (66,92)         3.3826
  (84,92)         4.1811
 (106,92)         3.3826
  (67,93)         3.3826
  (85,93)         4.1811
 (107,93)         3.3826
  (68,94)         3.3826
  (86,94)         4.1811
 (108,94)         3.3826
  (87,95)         1.7291
 (101,95)         1.7291
  (88,96)         1.7291
 (102,96)         1.7291
  (89,97)         1.7291
 (103,97)         1.7291
  (90,98)         1.7291
 (104,98)         1.7291
  (99,101)        4.1811
 (100,102)        4.1811
 (109,105)        4.1811
 (110,106)        4.1811
 (111,107)        4.1811
 (112,108)        4.1811
 (113,109)        4.1811
 (114,110)        4.1811
 (115,111)        4.1811
 (116,112)        4.1811
 (121,117)        4.1811
 (122,118)        4.1811
 (123,119)        4.1811
 (124,120)        4.1811
 (117,121)        4.1811
 (118,122)        4.1811
 (119,123)        4.1811
 (120,124)        4.1811
 (126,125)        4.1811
 (125,126)        4.1811
 (133,127)        1.7291
 (134,128)        1.7291
 (135,129)        1.7291
 (136,130)        1.7291
 (132,131)        4.1811
 (131,132)        4.1811
 (127,133)        1.7291
 (128,134)        1.7291
 (129,135)        1.7291
 (130,136)        1.7291
 (139,137)        1.7291
 (140,138)        1.7291
 (137,139)        1.7291
 (138,140)        1.7291
 (145,141)        4.1811
 (149,141)        3.3826
 (146,142)        4.1811
 (150,142)        3.3826
 (147,143)        4.1811
 (151,143)        3.3826
 (148,144)        4.1811
 (152,144)        3.3826
 (153,145)        4.1811
 (154,146)        4.1811
 (155,147)        4.1811
 (156,148)        4.1811
 (141,149)        3.3826
 (165,149)        4.1811
 (142,150)        3.3826
 (166,150)        4.1811
 (143,151)        3.3826
 (167,151)        4.1811
 (144,152)        3.3826
 (168,152)        4.1811
 (159,157)        3.3826
 (177,157)        4.1811
 (160,158)        3.3826
 (178,158)        4.1811
 (157,159)        3.3826
 (179,159)        4.1811
 (158,160)        3.3826
 (180,160)        4.1811
 (169,161)        4.1811
 (170,162)        4.1811
 (171,163)        4.1811
 (172,164)        4.1811
 (181,165)        4.1811
 (182,166)        4.1811
 (183,167)        4.1811
 (184,168)        4.1811
 (161,169)        4.1811
 (162,170)        4.1811
 (163,171)        4.1811
 (164,172)        4.1811
 (174,173)        4.1811
 (173,174)        4.1811
 (176,175)        4.1811
 (175,176)        4.1811
 (189,177)        4.1811
 (190,178)        4.1811
 (191,179)        4.1811
 (192,180)        4.1811
 (193,185)        4.1811
 (194,186)        4.1811
 (195,187)        4.1811
 (196,188)        4.1811
 (185,193)        4.1811
 (186,194)        4.1811
 (187,195)        4.1811
 (188,196)        4.1811
 (198,197)        4.1811
 (197,198)        4.1811
 (200,199)        4.1811
 (199,200)        4.1811
 (205,201)        4.1811
 (206,202)        4.1811
 (207,203)        4.1811
 (208,204)        4.1811
 (201,205)        4.1811
 (202,206)        4.1811
 (203,207)        4.1811
 (204,208)        4.1811
 (210,209)        4.1811
 (209,210)        4.1811
 (212,211)        4.1811
 (211,212)        4.1811

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

Sean de Wolski 2015년 5월 4일

MATLAB Online에서 열기

First, your screenshot is too small to see.

Second, here's a good exercise to explain the small differences in floating point: Run this:

>> format hex

Then rerun the command. See! They're different, even if just by a little.

luc 2015년 5월 4일

Hey Sean,

U can click on the screenshot to enlarge it.

But I think Stephen solved my problem. The sort functions grabs each colums independant, and not as a whole.

Thanks guys!

댓글을 달려면 로그인하십시오.

Answer 4

Robert 2018년 10월 17일

0 개 추천

If anyone encounters truly duplicate rows in the output of unique like I did, this may be caused by NaN in your data being treated as distinct values. See this question for more info.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Unique function not deleting duplicate rows.

댓글 수: 2
없음 표시 없음 숨기기

채택된 답변

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

추가 답변 (3개)

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

Community Treasure Hunt

Unique function not deleting duplicate rows.

댓글 수: 2 없음 표시 없음 숨기기

채택된 답변

댓글 수: 8 이전 댓글 6개 표시 이전 댓글 6개 숨기기

추가 답변 (3개)

댓글 수: 3 이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 6 이전 댓글 4개 표시 이전 댓글 4개 숨기기

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 8
이전 댓글 6개 표시 이전 댓글 6개 숨기기

댓글 수: 3
이전 댓글 1개 표시 이전 댓글 1개 숨기기

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기