I have integer matrix A (nA x c) with even number of columns (e.g. mod(c,2) = 0) and unique rows.
How to effectivelly (by speed and memory optimized function "symmetricRows") find the "symmetric" rows of matrix A iA1 and iA2, where "symmetric" rows iA1 and iA2 are defined as:
all(A(iA1,1:end/2) == A(iA2,end/2+1:end) & A(iA1,end/2+1:end) == A(iA2,1:end/2),2) = true
Example:
A = [1 1 1 1;
2 2 2 2;
1 2 3 4;
4 3 2 1;
2 2 3 3;
3 4 1 2;
3 3 2 2]
[iA1, iA2] = symmetricRows(A)
iA1 =
1
2
3
5
iA2 =
1
2
6
7
Typical size of matrices A: nA ~ 1e4-1e6, c ~ 60 - 120
The problem is motivated by pre-processing of large dataset, where "symmetrical" rows are irrelevant from the point of user defined distance metric.

 채택된 답변

Michal
Michal 2020년 2월 11일
편집: Michal 2020년 2월 11일

0 개 추천

I present the best solution so far:
d = ~pdist2(A(:,1:end/2), A(:,end/2+1:end));
[iA1, iA2] = find(triu(d & d.'));

댓글 수: 4

the cyclist
the cyclist 2020년 2월 11일
I tried an approach similar to the one you posted, but did some testing and realized that this approach will not work for arrays of size about A ~ [4e4,120]. You hit the (default) maximum array size limit. Since you said you need a solution for an array 50 times larger than that, I knew it was a problem.
Also, my preliminary estimate of how long it would take to run this theoretically was about 9 hours. I'm not sure about that, and was thinking about it some more.
Maybe the reason you are not getting "relevant" answers is simply because you have posted a very challenging problem.
Michal
Michal 2020년 2월 11일
On my PC (with 64GB RAM) in a case when matrix A is in class "single" I am able to process A ~ [1e5,1e2] matrix in reasonable time (cca 30 seconds).
I think that only useful solution will be based on any kind of process in the chunks, but I have no idea what type of in-loop processing will be best in my case.
But yes, you are right, the problem is very challenging...
the cyclist
the cyclist 2020년 2월 11일
Yeah, I should have mentioned that I did my testing on MATLAB Online, so it's probably not the most powerful platform. :-)
Michal
Michal 2020년 2월 11일
Yes, defintely, MATLAB Online is not proper way how to compute any memory or CPU intensive task at all ... :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

제품

릴리스

R2019b

태그

질문:

2020년 2월 10일

댓글:

2020년 2월 11일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by