필터 지우기
필터 지우기

Error using pdist2mex Error in kmeans>distfun

조회 수: 5 (최근 30일)
Tripoli Settou
Tripoli Settou 2018년 3월 18일
댓글: Walter Roberson 2021년 4월 2일
Hi, To represent our data (3233477*256) with Bag of visual word (BOW) which use KMeans clustering to extract visual words when we choose K=5000 this problem show:
Error using pdist2mex
Requested 3233477x5000 (120.5GB) array exceeds maximum array size preference. Creation of arrays greater than this limit may take a
long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more information.
Error in kmeans>distfun (line 747)
D = pdist2mex(X,C,'sqe',[],[],[]);
Error in kmeans/loopBody (line 445)
D = distfun(X, C, distance, 0, rep, reps);
Error in internal.stats.parallel.smartForReduce (line 136)
reduce = loopbody(iter, S);
Error in kmeans (line 335)
ClusterBest = internal.stats.parallel.smartForReduce(...
Error in BOWHistogram (line 12)
[idx,c,sumd,D2] = kmeans(double(Tab_Feature_Data),NumClust);
What can I do to fix the error? Please advise me
  댓글 수: 2
Rik 2018년 3월 18일
Can you split the array into smaller parts? Unless you get a 120GB contiguous block of memory, you can't use this method. I'm not familiar enough with what you want to do to suggest a real solution.
Tripoli Settou
Tripoli Settou 2018년 3월 18일
How can i do that i must extract the visual word from all data...how can i split it and get the visual word of all data?

댓글을 달려면 로그인하십시오.

채택된 답변

Bernhard Suhm
Bernhard Suhm 2018년 3월 25일
You could try converting your large input data into a tall array (maybe as simple as t = tall(double(Tab_Feature_Data)), and then pass that tall array to kmeans. Though watch there are limitations which options of kmeans are available with tall arrays, see https://www.mathworks.com/help/stats/tall-array-support-usage-notes-and-limitations.html
  댓글 수: 4
Bernhard Suhm
Bernhard Suhm 2018년 3월 26일
Your laptop will block unless you leverage multiple cores, but that also requires the Parallel Computing Toolbox. For large datasets like yours, you will need more powerful hardware or be patient with execution. The doc page I pointed you does point out that the 'tall' version of kmeans has only the first 3 output variables.
Tripoli Settou
Tripoli Settou 2018년 3월 26일
So u advise me to use the Parallel Computing Toolbox to leverage multiple cores? right? also, I need the 4th output variable (D2) in next work
[idx,c,sumd,D2] = kmeans(double(Tab_Features),NumClust);
and can I use the Parallel Computing Toolbox without tall version KMeans i mean KMeans with Parallel Computing Toolbox?? it's work??

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

carpcarp carpcarp
carpcarp carpcarp 2021년 4월 2일
  댓글 수: 2
carpcarp carpcarp
carpcarp carpcarp 2021년 4월 2일
You can see whether you have named a .m file called kmeans, like this:
if so, please delete it. Then run your program again.
Or if you don't know where is kmeans.m, you can download an application called everything.
My English level is limited, so I can only describe it simply.(←from translator)
Walter Roberson
Walter Roberson 2021년 4월 2일
(User points out that there can be problems if you accidentally have your own kmeans.m instead of using MATLAB's)

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by