I am currently working with large data sets, on the range of 500k-1m rows of data in any given matrix. (nx3) I want to know how to sift through the rows of the matrix to see if any of the rows have the same values in them. ex. [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7] I want to remove the second [1 2 3] row, such that [1 2 3; 2 3 4; 3 4 5; 4 5 6; 5 6 7] Can anyone help me with this?

The *|unique|* function can help here: A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7]; [Au,ia,ic] = unique(A, 'rows', 'stable'); RowIdxFreq = accumarray(ic, 1); RowIdxFreq = 2 1 1 1 1 The *|‘RowIdxFreq’|* variable has the frequencies of the occurrences of the rows. Here, row #1 is repeated.

Find similar values in a matrix

lsutiger1 2016년 2월 2일

Star Strider,

I did see that command in the documentation, and it is helpful. What I don't get from that is 1) where the second instance occurs and 2) a way to delete that row from the matrix without leaving 0's in it's place.

lsutiger1 2016년 2월 2일

Would just setting a value,

X = unique(A, 'rows', 'stable')

return the matrix without those rows?

Stephen23 2016년 2월 2일

Yes. Try it and see.

Star Strider 2016년 2월 2일

MATLAB Online에서 열기

To find and delete what rows are repeated, this works:

A = [1 2 3; 2 3 4; 3 4 5; 4 5 6; 1 2 3; 5 6 7; 3 4 5];
[Au,ia,ic] = unique(A, 'rows', 'stable');
RowIdxFreq = accumarray(ic, 1)
Repeats = find(RowIdxFreq > 1);
RowsToDelete = [];
for k1 = 1:length(Repeats)
    RepeatedRows{k1} = find(ic == Repeats(k1));
    RowsToDelete = [RowsToDelete; RepeatedRows{k1}(2:end)];
end
A(RowsToDelete,:) = [];                                         % ‘A’ With Repeated Rows Deleted

The ‘Repeats’ assignment finds the first row that has repeats elsewhere in the matrix, and the ‘Repeated Rows’ is a cell array that contains the rows that are duplicated. The ‘RowsToDelete’ keeps track of all of them, then the ‘A’ assignment after the loop uses it to delete all of them at once.

It is not necessary to keep the ‘RepeatedRows’ data in an array. I did here because I wanted to be certain it was doing what I wanted it to.

lsutiger1 2016년 2월 2일

편집: lsutiger1 2016년 2월 2일

MATLAB Online에서 열기

Using the unique function is now not working. I have a cell array, which is composed of a string of letters and then coordinates, ex [C 1 1 1], which I created by

X = [atom_names num2cell(atomPosition_flat)];

This gives me a cell array (nx4).

I try to use unique to find where the repeated rows are,

atomPositions = unique(X,'rows','stable');

But get this error: Input A must be a cell array of strings.

Using num2str on the atomPosition_flat matrix (nx3) turns it into an nx33 char.

Star Strider 2016년 2월 2일

MATLAB Online에서 열기

Without having your matrix to experiment with, I can only guess.

See if adding a cell reference (the ‘{}’ brackets) works:

atomPositions = unique(X{:},'rows','stable');

If you have a relatively ‘uncomplicated’ cell array, that should work. If unique still has problems, you might have to use sprintf to convert the numbers to strings before you do the operations in my code. (I assume ‘atom_names’ are already strings.)

lsutiger1 2016년 2월 2일

편집: lsutiger1 2016년 2월 2일

The matrix is a 2520x3 matrix, and yes, atom_names is a 2520x1 vector of strings. I tried converting it to strings using num2str, which did not work, because then I got a "dimension mismatch" error when num2str converted my matrix into a 2520x33 char. Will try to use sprintf.

Find similar values in a matrix

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기

추가 답변 (0개)

카테고리

태그

Community Treasure Hunt

Find similar values in a matrix

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 7 이전 댓글 5개 표시 이전 댓글 5개 숨기기

추가 답변 (0개)

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 7
이전 댓글 5개 표시 이전 댓글 5개 숨기기