필터 지우기
필터 지우기

How can I merge similar rows in a matrix based on the first three columns' value.

조회 수: 10 (최근 30일)
I have a very big matrix with 4 columns. The first three columns are coordinates of a point in a discrete 3D space, and the last column is the weight of that point. For example:
A = [1,1,1,0.2; 1,1,2,0.9; 1,2,1,1.2; ...]
Some of the coordinates, however, are duplicates with different weights. For example I might have:
A = [1,1,1,0.2; 1,1,1,2.3; 1,1,2,-0.3; ...]
What I want to achieve is to remove the duplicate coordinates, and use the mean of their weights as the weight for that coordinate. For example, after this operation, the last example will become:
A_new = [1,1,1,1.25; 1,1,2,-0.3; ...]
I have already written a code and it works is:
A_new = unique(A(:,1:3),"rows");
A_new = [A_new zeros(length(A_new),1)];
for i = 1:length(A_new)
coord = A_new(i,1:3);
dups = A(all(A(:,1:3)==coord,2), 4);
A_new(i,4) = mean(dups);
end
But it is very slow for large matrix (e.g., 1000000 rows). Can I optimize this code in anyway?
Thank you in advance.
Shayan

채택된 답변

Cris LaPierre
Cris LaPierre 2022년 1월 2일
Use groupsummary. Group by the first 3 columns, and use 'mean' to determine the value of the fourth. I find it easier to use on tables, so I convert A to a table first.
A = [1,1,1,0.2; 1,1,1,2.3; 1,1,2,-0.3];
A = array2table(A);
B = groupsummary(A,1:3,'mean',4)
B = 2×5 table
A1 A2 A3 GroupCount mean_A4 __ __ __ __________ _______ 1 1 1 2 1.25 1 1 2 1 -0.3

추가 답변 (1개)

Voss
Voss 2022년 1월 2일
Generate some random data mimicking your situation:
[X,Y,Z] = ndgrid(1:2,1:3,1:2);
A = [X(:) Y(:) Z(:) rand(numel(X),1)];
A(:,3) = 1;
disp(A);
1.0000 1.0000 1.0000 0.5270 2.0000 1.0000 1.0000 0.5825 1.0000 2.0000 1.0000 0.3314 2.0000 2.0000 1.0000 0.3005 1.0000 3.0000 1.0000 0.4200 2.0000 3.0000 1.0000 0.1138 1.0000 1.0000 1.0000 0.1304 2.0000 1.0000 1.0000 0.1349 1.0000 2.0000 1.0000 0.4907 2.0000 2.0000 1.0000 0.8794 1.0000 3.0000 1.0000 0.3126 2.0000 3.0000 1.0000 0.2244
Use a loop like yours but comparing indices:
[A_new,~,ii] = unique(A(:,1:3),'rows');
A_new = [A_new zeros(size(A_new,1),1)];
for i = 1:size(A_new,1)
A_new(i,4) = mean(A(ii == i,4));
end
disp(A_new);
1.0000 1.0000 1.0000 0.3287 1.0000 2.0000 1.0000 0.4111 1.0000 3.0000 1.0000 0.3663 2.0000 1.0000 1.0000 0.3587 2.0000 2.0000 1.0000 0.5899 2.0000 3.0000 1.0000 0.1691
Or do the same thing with arrayfun():
[A_new,~,ii] = unique(A(:,1:3),'rows');
A_new(:,end+1) = arrayfun(@(i)mean(A(ii == i,4)),1:size(A_new,1));
disp(A_new);
1.0000 1.0000 1.0000 0.3287 1.0000 2.0000 1.0000 0.4111 1.0000 3.0000 1.0000 0.3663 2.0000 1.0000 1.0000 0.3587 2.0000 2.0000 1.0000 0.5899 2.0000 3.0000 1.0000 0.1691
  댓글 수: 2
Shayan Taheri
Shayan Taheri 2022년 1월 2일
Thank you very much for your suggestion. This method was definately cleaner than my code, though it wasnt't much different in terms of speed. I ran it for an array of 454000 rows and the processing time was 409 seconds. The other solution based on groupsummary achieved 14 seconds.
Voss
Voss 2022년 1월 3일
Good to know. I wasn't sure either of these ways would be much different than what you had in terms of speed.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by