Append data to matfile using parallel method
조회 수: 1 (최근 30일)
이전 댓글 표시
Hi:
I have a lots of data that needs to be saved into a test.mat file, below are my test code:
x=rand(10000,1);
save('test.mat','x');
for i=1:1:100
eval(['va_',num2str(i),'=rand(10000,1);'])
eval(['save(','''','test.mat','''',',','''','va_',num2str(i),'''',',','''','-append','''',')'])
end
the problem is that, this is a test code, in my real situation:
1. the number of variables is very large in my situation (up to va_10000).
2. the size of data of each 'va_i' is very large (up to size of 2e6*1).
in this way, although I have upgrade my drive into 960EVO ssd, the saving time is still significantly large.
is there anyway to improve the code into parallel saving? so that I could save the computational cost?
Thanks!
Yu
댓글 수: 6
Walter Roberson
2018년 9월 13일
Is the size and data type of each variable the same?
Is the data likely to be compressible?
답변 (2개)
Steven Lord
2018년 9월 13일
Consider writing each variable to a different file in such a way that when you want to use them later on you can construct a datastore using that collection of files and make a tall array from the datastore.
Walter Roberson
2018년 9월 13일
You cannot write to a mat file in parallel. If writing in parallel to a mat file is a requirement then your problem cannot be solved.
If computation of the items is expensive, then do the computation in parallel, writing to different mat files (though potentially one per parallel core rather than one per variable.) Afterwards, merge the files together in a serial loop.
With the data not being compressible, either write in binary or else use the -7.3 option to not compress the output.
댓글 수: 2
Walter Roberson
2018년 9월 13일
Overall saving time might not increase under the assumption that calculation of the array is expensive. If the average rate of graduation is less than the time required to save one variable then parfor for the calculation and merging afterwards can potentially save time.
Another approach in the case where calculations are expensive is to use a pollable data queue to calculate results in parallel and send them back to the client process to do the saving.
If the average rate of graduation is faster than the time to save one variable then you are probably bandwidth limited in writing to the ssd, and increasing the number of simultaneous writers will not increase the bandwidth.
참고 항목
카테고리
Help Center 및 File Exchange에서 Timetables에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!