Hello, I'd like to run function pdist of Matlab on an array whose the length is N*(N-1)/2 where N=340000. Matlab is out of memory to preallocate this array. Could anyone give me a solution please? Kind regards, Winn

댓글 수: 1

Oleg Komarov
Oleg Komarov 2014년 9월 24일
As per the you reference in http://www.mathworks.co.uk/matlabcentral/answers/156028#comment_239068, you can block process and keep partial sums. In any case you will need to use for loops.

댓글을 달려면 로그인하십시오.

 채택된 답변

Sean de Wolski
Sean de Wolski 2014년 9월 24일

0 개 추천

That's only going to require 57.8 gigabytes of memory for a single column. And that's the end result. Surely there'll be some large intermediate arrays as well.
I'd recommend downsampling or chunking up the calls.
Do you need every pairwise distance? Are you looking for something specific? What's the end goal?

댓글 수: 8

Win co
Win co 2014년 9월 24일
It's very kind of you to have answered. In fact, I'd like to compute Tau-index to evaluate quality of clustering. You can find more details of this index in the tutorial of the package Fuzzy Clustering Toolbox (cf. p.18). This index need all distances dij between pairs of points (Mi,Mj) (with i < j).
Sean de Wolski
Sean de Wolski 2014년 9월 24일
Does that require every pairwise distance to live at once or can you step through keeping just a few relevant ones?
Adam
Adam 2014년 9월 24일
I imagine you'll have some speed issues with that too unless you down-sample a little as Sean suggests as one option.
Win co
Win co 2014년 9월 24일
편집: Win co 2014년 9월 24일
Let me make it clear: Eg: given a set of points {M1, M2, M3, M4, M5} (N=5). I have to compute all distances dij between pairs of points (Mi,Mj) (with i < j), so in this case:
(M1,M2)
...
(M1,M5)
(M2,M3)
..
(M2,M5)
(M3,M4)
(M3,M5)
(M4,M5)
In total, I have a distance array of N*(N-1)/2=10 elements. Coming back to my problem, now I have N=340000 points, if I down-sample my set of points, I don't know how to compute all distances as mentioned in the example. P/S: every pairwise distance is required.
Sean de Wolski
Sean de Wolski 2014년 9월 25일
I understand that every pairwise distance is required. But is every pairwise distance required to exist in memory at the same time? OR can you calculate a few of them, figure out what you need from it (max/min/range/whatever) and then discard the rest before proceeding?
Win co
Win co 2014년 9월 25일
not the same time. In fact, I've found one solution:
1) given a matrix X of mxn
2) calculate pdist2 between each item of X and the rest, then save to file, so totally I have m*(m-1)/2 files in my hard drive
I can share the code if someone's interested.
Sean de Wolski
Sean de Wolski 2014년 9월 25일
That's a good idea. Do you even need the files though? Because they'll take up a ton of space. Could you just gather the info you need from the data and write only the results or pairs you care about?
You might want to look into doing this in parallel with a parfor loop, it could help speed it along. Though writing the files will likely be the bottleneck and it will be a hardware limitation not a software one.
Win co
Win co 2014년 9월 25일
I'm doing my computation on cluster server, I'll delete these files once comparing all distances between files is done. I could give a simple example of how my code works if you want. Thanks sincerely for all your taking time to my problem.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Adam
Adam 2014년 9월 24일

0 개 추천

Depending on how much over memory it is you could try converting your data to single before you pass it to pdist. That should take half the memory.
I don't know off-hand if pdist is overloaded for integer types or not. If it is then you could also use them depending what level of accuracy you requie.

댓글 수: 2

Thanks for your response. But even pre-allocation (code below) runs out of memory of Matlab.
dist=zeros(N*(N-1)/2,1);
Well, yes, but that creates an array of doubles. You can try pre-allocating:
dist = zeros(N*(N-1)/2,1, 'single')
or even
dist = zeros(N*(N-1)/2,1, 'uint8')
but the latter option assumes pdist works on uint8 data and that you really don't care much about accuracy!

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Matrix Indexing에 대해 자세히 알아보기

질문:

2014년 9월 24일

댓글:

2014년 9월 25일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by