Whats the best command to write to file inside parfor loop?

조회 수: 44 (최근 30일)
Chris
Chris 2012년 3월 22일
편집: Walter Roberson 2022년 8월 1일
I want to write results to a file within a parfor loop and ideally append the results. What is the best command that can handle multiple writes potentially at the same time?
  댓글 수: 2
Geoff
Geoff 2012년 3월 22일
This is why MatLab needs a mutex / critical section facility. =(

댓글을 달려면 로그인하십시오.

채택된 답변

Edric Ellis
Edric Ellis 2012년 3월 23일
Inside PARFOR, you can still access the current task, and get its ID - that will be unique. My "Worker Object Wrapper" has an example how you could use that to open a file on each worker with a persistent file handle so that each worker can write its own file when executing a PARFOR loop.
  댓글 수: 2
TOSA2016
TOSA2016 2019년 7월 10일
편집: TOSA2016 2019년 7월 10일
Thanks Edric for your resposne. I am just going to add some extra lines to it. In the new releases, worker object wrapper can be used by parallel.pool.Constant(). This is the code I used from mathwork web site (please click here ) to print line by line in parfor.
clc
clear
c = parallel.pool.Constant(@() fopen(tempname(pwd),'wt'),@fclose);
spmd
A=(fopen(c.Value));
end
parfor idx = 1:1000
fprintf(c.Value,'Iteration: %d\n',idx);
end
clear c; % Closes the temporary files.
With this, you will ended up with several, depending on the number of workers, txt files with tempprary names that matlab chooses for them. Now, you would probably want to make them a single txt file. This is what I did inpired by the code posted (please click here).
for i = 1: length(A)
a= char(A(1,i));
NAMES(i,:) = a(1, length(pwd)+2:length(a));
movefile(NAMES(i,:),sprintf('%d.txt',i));
end
fileout='OneFatFile.txt';
fout=fopen(fileout,'w');
for cntfiles=1:length(A)
fin=fopen(sprintf('%d.txt', cntfiles));
while ~feof(fin)
fprintf(fout,'%s \n',fgetl(fin));
end
fclose(fin);
end
fclose(fout);
% you can then delete the unnecesary files by the following loop
fclose('all');
for i = 1:length(A)
delete(sprintf('%d.txt',i))
end
For combining the txt files, I did not use the recommendations posted (please click here) since I faced several issues running the code on our cluster. It would be appreciated if someone can suggest a way to get rid of the while loop in the posted code.
Peng Li
Peng Li 2020년 6월 2일
You may take advantage of spmd block to merge these temporal files as well. The variable A (a composite) stores the information. For example:
spmd
tblLab = readtable(A, 'ReadVariableNames', 0);
end
tbl = vertcat(tblLab{:});
writetable(tbl, 'OneFatFile.txt');
you can delete temporal files as well within the spmd block, if you want.

댓글을 달려면 로그인하십시오.

추가 답변 (4개)

Jason Ross
Jason Ross 2012년 3월 22일
Multiple writes to the same file are a quick route to a corrupt file. You need to come up with a plan to assemble the file where only one write is done at a time. For example, you could write a collection of small files that are independently named and then have code that concatenates the data into the one result file.
The tempname function can return you a unique name, and then you can combine it with other information, such as hostname, lab index, time opened, etc to build up the filename.
When you are dealing with files you also need to make sure to pay attention to the return codes of fopen, fclose, etc. Duplicate filenames, read-only filesystems and full filesystems happen, and you should think about how you will handle these conditions when they occur.
  댓글 수: 2
Jeremy
Jeremy 2014년 7월 18일
I know this is an old thread...
Why couldn't I do something like
FileIDResults = -1;
while FileIDResults == -1
FileIDResults = fopen('projects_results.txt', 'a');
end
fprintf(FileIDResults,....)
fclose(FileIDResults)
Wouldn't each worker then loop until it grabbed access to the file, lock the others out while it did it's fprintf, then free the file back up when it closed?
Edric Ellis
Edric Ellis 2014년 7월 21일
While that might work, you're somewhat at the mercy of the operating system as to whether it gives you exclusive write access; plus your results will come out in a non-deterministic order.

댓글을 달려면 로그인하십시오.


Jill Reese
Jill Reese 2012년 3월 22일
If you are able to run your code inside an spmd block instead of via parfor, then you will be able to use the labindex variable to create a unique file name for each worker to write to. That is the best option.

Konrad Malkowski
Konrad Malkowski 2012년 3월 22일
Are you planning on writing data from multiple workers to a single file within PARFOR? If so, then there are no commands that will allow you to do it.

Fernando García-García
Fernando García-García 2014년 12월 6일
편집: Fernando García-García 2014년 12월 6일
Hello everyone,
Well, I'm planning to do what you said, Konrad. What if I do the following?
filename='myfile.txt';
parfor i=1:N
% do something very time-comsuming, like hours or so
while checkIfOpen(filename)
pause(1); % i don't mind waiting for just 1 second
end
fileID=fopen(filename,'a+');
fprintf(fileID,...); % write whatever, very quick as it's not much data
fclose(fileID);
end
function isOpen=checkIfOpen(filename)
isOpen=false;
openIDs=fopen('all');
for j=1:numel(openIDs)
filenameTEMP=fopen(openIDs(j),'r');
idxStrFind=strfind(filenameTEMP,filename);
if ~isempty(idxStrFind) % non empty
if idxStrFind(end)==size(filenameTEMP)-size(filename)+1
% found at the end of the entire path
isOpen = true;
break;
end
end
end
Note 1: I don't mind if the writing is not in deterministic order.
Note 2: I would have never expected that, being such a long processing time for the task (and this time varying randomly from iteration to iteration somewhere in the range of minutes) compared to the very brief write operation (milliseconds)... that there was the enormous coincidence of two workers trying to write to file at the same time, but it did occur! Should have bought lottery, hehehe.
Note 3: Code corrected.
Note 4: I'm not sure how to actually check if this code behaves as expected.
  댓글 수: 7
Edric Ellis
Edric Ellis 2022년 8월 1일
@Paul Safier a couple of things to note: if you're interested in the WorkerObjectWrapper - this ended up in the product as parallel.pool.Constant. Also of interest might be parallel.pool.DataQueue which lets you send data from workers back to the client, and let the client amalgamate / write to a file / whatever.
Bruno Luong
Bruno Luong 2022년 8월 1일
Edric, what is the advantage/disadvantage of using DataQueue/AfterEach vs fetchNext?

댓글을 달려면 로그인하십시오.

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by