how can I use pdist2 function for big data?

I want to implement k-means in matlab. my data set is matrix 9,000,000 by 1. when I used Euclidean for finding distance of points, I faced with following error:
Error using pdist2mex
Out of memory. Type HELP MEMORY for your options.
Error in pdist2 (line 343)
D = pdist2mex(X',Y',dist,additionalArg,smallestLargestFlag,radius);
Error in k_means_new (line 38)
dist = pdist2(d,centroids,distance); % distance between all data points and
centroids
I'd like to mention that I used matlab in system with windows 8 and following configuration :
RAM: 8G
CPU: intel core i5-3230M
so would you please help me?
Thanks in advance.

댓글 수: 2

what is size(d) and size(centroids) ?
mina movahed
mina movahed 2016년 4월 30일
편집: mina movahed 2016년 4월 30일
size(d)= 9000000 * 1
size(centroids)=240

댓글을 달려면 로그인하십시오.

답변 (2개)

Image Analyst
Image Analyst 2016년 4월 30일

0 개 추천

Chances are you don't need that all in memory at the same time. What are you really trying to do? Like find the two points farthest from each other? If so, a simple double for loop where you're storing only the max distance (one value) instead of an 18 gigapixel array would work. OR you might be able to get what you need by taking a subsample of your original 9 million element array. So tell us the big picture. What are you really trying to accomplish so we can advise you on a better, less memory intensive approach.

댓글 수: 1

mina movahed
mina movahed 2016년 5월 2일
first of all, sorry I did not see your comment. as Walter said, it is better, to rewrite the algorithm to not need as much memory. I want to implement some data mining algorithms in Matlab and after the analyze the data.

댓글을 달려면 로그인하십시오.

Walter Roberson
Walter Roberson 2016년 4월 30일

0 개 추천

Why are you bothering with euclidean distance between 1 dimension objects? That is the same as abs() of the difference between them
abs(bsxfun(@minus, d, centroids(:).'))
This is only going to be 9000000 * 240 entries, each of 8 bytes, which is only 17.28 gigabytes. An additional working storage of 9000000 * 8 bytes (72 megabytes) would also be required. Just make sure your swap space is set large enough to hold the array, and set your preferences to not prevent large arrays. It should probably only take 5 or 6 hours to compute.

댓글 수: 6

mina movahed
mina movahed 2016년 4월 30일
편집: mina movahed 2016년 4월 30일
First of all, thanks a lot for your response. Before running this function, I change the virtual memory as below:
initial size : 4577 MB Maximum size : 10240 MB
should I change it again? and how can I set preferences to not prevent large arrays? would you please guide me.
You will probably need about 20 gigabytes of virtual memory
Preferences -> Workspace, and at the bottom uncheck "Limit the maximum array size to a percentage of RAM"
Remember that using virtual memory may be very slow. It will work eventually . Typically it is more productive to rewrite the algorithm to not need as much memory.
Image Analyst
Image Analyst 2016년 4월 30일
Are you going to answer my question and tell us what you're really after so we can see if this approach of having one gigantic array in memory all at the same time is the best approach?
For example, if the task was to find the distance to the closest centroid, then:
sorted_cent = sort(centroids);
nearest_cent_idx = interp1(sorted_cent, 1:length(sorted_cent), d, 'nearest', 'extrap');
dist_to_nearest = abs(d - sorted_cent(nearest_cent_idx));
mina movahed
mina movahed 2016년 5월 2일
thanks a lot. I will try this and if it worked, I will inform you. the task is implementation of k_means and so I need to find the distance between all points and centroids.
For k_means you do not need to retain those distances, you only need to figure out where the closest one is. That takes the long term storage requirement down by a factor of length(centroids)

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Text Data Preparation에 대해 자세히 알아보기

질문:

2016년 4월 29일

댓글:

2016년 5월 2일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by