I am currently working with large data sets, on the range of 500k-1m rows of data in any given matrix. (nx3)
I want to know how to sift through the rows of the matrix to see if any of the rows have the same values in them.
ex. [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7]
I want to remove the second [1 2 3] row, such that [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7]
Can anyone help me with this?

 채택된 답변

Star Strider
Star Strider 2016년 2월 2일

0 개 추천

The unique function can help here:
A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7];
[Au,ia,ic] = unique(A, 'rows', 'stable');
RowIdxFreq = accumarray(ic, 1);
RowIdxFreq =
2
1
1
1
1
The ‘RowIdxFreq’ variable has the frequencies of the occurrences of the rows. Here, row #1 is repeated.

댓글 수: 7

lsutiger1
lsutiger1 2016년 2월 2일
Star Strider,
I did see that command in the documentation, and it is helpful. What I don't get from that is 1) where the second instance occurs and 2) a way to delete that row from the matrix without leaving 0's in it's place.
lsutiger1
lsutiger1 2016년 2월 2일
Would just setting a value,
X = unique(A, 'rows', 'stable')
return the matrix without those rows?
Stephen23
Stephen23 2016년 2월 2일
Yes. Try it and see.
To find and delete what rows are repeated, this works:
A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7; 3 4 5];
[Au,ia,ic] = unique(A, 'rows', 'stable');
RowIdxFreq = accumarray(ic, 1)
Repeats = find(RowIdxFreq > 1);
RowsToDelete = [];
for k1 = 1:length(Repeats)
RepeatedRows{k1} = find(ic == Repeats(k1));
RowsToDelete = [RowsToDelete; RepeatedRows{k1}(2:end)];
end
A(RowsToDelete,:) = []; % ‘A’ With Repeated Rows Deleted
The ‘Repeats’ assignment finds the first row that has repeats elsewhere in the matrix, and the ‘Repeated Rows’ is a cell array that contains the rows that are duplicated. The ‘RowsToDelete’ keeps track of all of them, then the ‘A’ assignment after the loop uses it to delete all of them at once.
It is not necessary to keep the ‘RepeatedRows’ data in an array. I did here because I wanted to be certain it was doing what I wanted it to.
lsutiger1
lsutiger1 2016년 2월 2일
편집: lsutiger1 2016년 2월 2일
Using the unique function is now not working. I have a cell array, which is composed of a string of letters and then coordinates, ex [C 1 1 1], which I created by
X = [atom_names num2cell(atomPosition_flat)];
This gives me a cell array (nx4).
I try to use unique to find where the repeated rows are,
atomPositions = unique(X,'rows','stable');
But get this error: Input A must be a cell array of strings.
Using num2str on the atomPosition_flat matrix (nx3) turns it into an nx33 char.
Without having your matrix to experiment with, I can only guess.
See if adding a cell reference (the ‘{}’ brackets) works:
atomPositions = unique(X{:},'rows','stable');
If you have a relatively ‘uncomplicated’ cell array, that should work. If unique still has problems, you might have to use sprintf to convert the numbers to strings before you do the operations in my code. (I assume ‘atom_names’ are already strings.)
lsutiger1
lsutiger1 2016년 2월 2일
편집: lsutiger1 2016년 2월 2일
The matrix is a 2520x3 matrix, and yes, atom_names is a 2520x1 vector of strings. I tried converting it to strings using num2str, which did not work, because then I got a "dimension mismatch" error when num2str converted my matrix into a 2520x33 char. Will try to use sprintf.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Logical에 대해 자세히 알아보기

질문:

2016년 2월 2일

편집:

2016년 2월 2일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by