Mapreduce on parallel cluster - Database or disk full - How control storage of intermediate files?
조회 수: 1 (최근 30일)
이전 댓글 표시
I wish to calculate several statistics (Spectra, Correlation Functions, etc.) of ~400 files with 6e6 doubles per file and afterwards average over all files to get average spectra, correlation functions, etc. To make things fast, I try to use mapreduce on a parallel cluster. This works like a charm as long as there are relatively few files (~100), but with a larger amount of files I get this error message:
Error using parallel.mapreduce.KeyValueOutputStore/addmulti (line 63)
Error in adding keys and values.
Error in Analysis20190708>Analysis (line 115)
addmulti(intermKVStore, {'StatNames'}, {Stats});
Error in parallel.internal.pool.deserialize>@(data,info,intermKVStore)Analysis(data,Parameters,info,intermKVStore)
Error in mapreduce (line 116)
outds = execMapReduce(mrcer, ds, mapfun, reducefun, parsedStruct);
Error in Analysis20190708 (line 72)
outDS = mapreduce(ds, mapper, @reduceAnalysis,inpool);
Caused by:
The database /tmp/filename/TaskOutput7.db is full. (database or disk is full)
The message occurs after around 50% of the map phase is done, but later when I reduce the size of the result vectors (less frequently sampled spectra for example). I checked with an admin and the /tmp indeed has very limited free space.
The question is now: How do I tell MATLAB to store these intermediate(?) files to a different location with more storage
댓글 수: 0
채택된 답변
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Standard File Formats에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!