Find the desired row in the matrix

Question

Chenglin Li 2022년 10월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix

댓글: Jan 2022년 10월 25일

matrix.xlsx

Hello! I have a matrix, the first three rows are the x, y, and z coordinates of the points, the fourth row is the sum of the first three columns, and the fifth row is the product of the first three columns. I want to extract the index of the number of rows that occur only once in the matrix, for example, the sum of the first row is 90, the product is 26040, they are unique, so I extract it; If it's line 10 and line 22, only the sum is the same, but the product is different and they're extracted separately; If you have rows 55 and 56, the sum and the product are the same, then you only need to extract one row of data.

Can anyone help me with this as I'm completely new with MATLAB. I would be grateful.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Chenglin Li 2022년 10월 24일

Well, thank you for your answer. Thank you very much！

Jan 2022년 10월 24일

@Rik: I thought of unqiue or histcounts also, but did not found a solution. Please check my answer. I'd be glad to see a less twiddling solution.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Jan 2022년 10월 24일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1081953

편집: Jan 2022년 10월 24일

MATLAB Online에서 열기

While removing multiple rows is easy using the unique(x, 'rows'), I did not find a built-in functions to identify the vectors, which occur once only.

If the data set is small (some hundrets of rows), a nested loop is fine:

% Remove rows from M, which columns 4:5 are not occurring once only:
% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :)     % Only rows, which occur once only
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

A small acceleration is (most likely, test this using tic/toc) to test the columns separately:

A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end

The costs for this nested loops grow with O(2), so the double size of the inputs needs 4 times longer to be processed. This gets very slow for huge data sets, e.g. with millions of rows. Then:

% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :)
Result = 3×5
    0.8932    0.9660    0.0837    2.0000    3.0000
    0.0193    0.2833    0.5621    4.0000    2.0000
    0.6671    0.1563    0.3340    1.0000    5.0000

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Chenglin Li 2022년 10월 25일

Thank you very much, this program has helped me a lot, let me have the next idea!!!

댓글을 달려면 로그인하십시오.

Answer 2

Rik 2022년 10월 25일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1833643-find-the-desired-row-in-the-matrix#answer_1082688

MATLAB Online에서 열기

Inspired by the answer and comment by Jan, I gave it a try as well. However, at least for this size, the answers from Jan are faster. Perhaps the functions I use would scale better, but I did not test that.

Perhaps accumarray would have a better performance than histcounts. If this is really a bottleneck in your code, you could try that.

% Assuming than M is your matrix:
M = [rand(6, 3), [1,2; 2,3; 1,2; 4,2; 1,5; 1,2]];
%    ^ just some stuff
% Confirm the results match:
Jan_v1(M) , Rik(M)
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
ans = 3×5
    0.7516    0.4447    0.5350    2.0000    3.0000
    0.0193    0.3578    0.2807    4.0000    2.0000
    0.1917    0.4433    0.3679    1.0000    5.0000
% do warmup rounds first (only needed online), then test the timing for
% each implementation
for n=1:3,timeit(@()Jan_v1(M));timeit(@()Jan_v2(M));timeit(@()Jan_v3(M));timeit(@()Rik(M));end
timeit(@()Jan_v1(M)),timeit(@()Jan_v2(M)),timeit(@()Jan_v3(M)),timeit(@()Rik(M))
ans = 1.5764e-05
ans = 1.1022e-05
ans = 1.9246e-05
ans = 6.7721e-05
function output=Rik(M)
% Return the rows of the matrix for which the entries in the 4th and 5th column are unique.
% First create a temporary matrix that only contains the relevant columns.
A = M(:, 4:5);
% indA contains indices to A to create the unique list
% indB contains indices to the unique list to get back to A
% We need to use 'stable' to avoid sorting.
[~,indA,indB] = unique(A,'rows','stable');
% Count how often every index occurs
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v1(M)
off = false;         % Slightly faster
A   = M(:, 4:5);     % Columns used for comparison
nA  = size(A, 1);    % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = all(A(iA, :) == A, 2);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);    % Only rows, which occur once only
end
function Result=Jan_v2(M)
off = false;         % Slightly faster
A4  = M(:, 4);       % Columns used for comparison
A5  = M(:, 5);       % Columns used for comparison
nA  = size(A4, 1);   % Number of rows
T   = true(nA, 1);
for iA = 1:nA
  if T(iA)           % If not excluded already
     d = (A4(iA) == A4 & A5(iA) == A5);
     if sum(d) > 1   % More than 1 occurrence found
        T(d) = off;  % Mark all occurrences
     end
  end
end
Result = M(T, :);     % Only rows, which occur once only
end
function Result=Jan_v3(M)
% Remove rows from M, which columns 4:5 are not occurring once only:
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

댓글 수: 2
없음 표시없음 숨기기

Chenglin Li 2022년 10월 25일

Thank you. I'll try again. Thank you very much indeed

Jan 2022년 10월 25일

MATLAB Online에서 열기

Thanks, @Rik, for this comparison. While my loop versions have some speed advantages for tiny input, they are far to slow for large data. With

n  = 1e6;
M1 = [rand(n, 3), randi([0, 1000], n, 2)];  % Few repeated values
M2 = [rand(n, 3), randi([0, 10], n, 2)];    % Many repeated vaues
for n=1:1, timeit(@()Jan_v3(M1));timeit(@()Rik(M1));end
timeit(@()Jan_v3(M1))
timeit(@()Rik(M1))
timeit(@()Jan_v3(M2))
timeit(@()Rik(M2))

Sorry, I hesitate to post the timings online, because they vary from run to run by 25% ! The difference between the 2 functions is smaller than this deviation between runs. My conclusion: Both have almost the same hight speed.

function output=Rik(M)
A = M(:, 4:5);
[~,indA,indB] = unique(A,'rows','stable');
counts = histcounts(indB,0.5:(0.5+max(indB))); % create bin edges from 0.5 to 4.5
RowsWithOneOccurrence = indA(counts==1);
output = M(RowsWithOneOccurrence,:);
end
function Result=Jan_v3(M)
[A, idx]    = sortrows(M(:, 4:5));
nextEq      = [true; diff(A(:, 1)) | diff(A(:, 2))];
% nextEq      = [true; any(diff(A, 1, 1), 2)];
ini         = strfind(nextEq.', [true, false]);
nextEq(ini) = false;                 % Mark 1st occurence in addition
T           = false(size(A, 1), 1);  % Pre-allocation, TRUE or FALSE doesn't matter
T(idx)      = nextEq;                % Original order
Result      = M(T, :);
end

댓글을 달려면 로그인하십시오.

Find the desired row in the matrix

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Find the desired row in the matrix

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시없음 숨기기