MATLAB Answers

J T
0

Writing error in parfor loop

Asked by J T
on 5 Jun 2015
Latest activity Commented on by Walter Roberson
on 8 Jun 2015
Hi everyone,
I have a script that randomly removes and repredicts links in networks 100 times. Of course the bigger the networks are the longer it takes so, I made the 100 iterations into a parfor loop. It works fine for smaller networks but with the big ones I get the error
A write error occurred while sending to worker xy.
I assumed its due to memory problems but I can check in the task manager that the error occurs before the memory of my pc is used up. Also it says writing error...So does anybody have an idea what this error message refers to? Im using windows 7 pro, 64bit and matlab R2014a and I have 196GB of ram(I know quite a lot (:).
Any help would be appreciated! Thanks a lot, Josephine

  3 Comments

Adam
on 5 Jun 2015
Is your memory anywhere close to being used up when the error occurs? It may be that it tries to allocate a large chunk of memory and fails so it wouldn't just allocate up to what is available and then crash.
I imagine you'd need to show some code to get a better idea though. It is hard to get an idea of how much has to be copied to each of the workers from just what you say. If it is a lot then remember that memory allocation is effectively multiplied by how many workers you have if some large amount of data must be copied to all workers.
I have been caught out by this numerous times. What seems like not an excessive memory requirement suddenly explodes when it needs to be assigned to 8 parallel cores when I got my code slightly wrong so that it was copying more than it needed to to each worker.
J T
on 5 Jun 2015
Dear Adam, thanks a lot for the fast answer! It depends on the run, sometimes the memory needed goes to 180gb before the crash mostly only to 140gb. The code is for yet unpublished material, so give me a minute I need to think about what I may write about it...
J T
on 5 Jun 2015
Dear Adam, Im sorry Im not so sure what I can post or not, I will have to wait until my supervisor is back on monday. But maybe you can be so kind, to comment the following: The code runs without parfor(just so slow it will probably take years) and needs about 100gb. So if it is that for two workers then u need 200gb of course I get the error. But I tried running the parfor with only one worker, even that does not work. I tried printing out something in the parfor and it seems it doesnt even enter the first iteration. Is it possible, that the transferring alone needs so much memory?? Thanks! Josephine

Sign in to comment.

Tags

2 Answers

Answer by Thomas Koelen on 5 Jun 2015

Try writing a function that writes the data, then call this function in your parfor loop.

  1 Comments

J T
on 5 Jun 2015
Dear Thomas, so simplified u mean I should do parfor 1:x; a=f(x); where f=x**2 or so instead of parfor 1:x; a=x**2; ? The function that is doing the major work Im already using like this, I put the rest also in a function now, but unfortunately I get the same error..

Sign in to comment.


Answer by Walter Roberson
on 5 Jun 2015

The data needs to be transferred to even the one worker so Yes you can run out of memory even with one worker.
Have a look at Worker Object Wrapper as it might help in your situation.

  2 Comments

J T
on 8 Jun 2015
Dear Walter thanks for your answer! Is there a bit more of a detailed description of that worker object wrapper? Im new to everything with parallelization and although it seems pretty simple to use I guess Im using it wrong because it opens several matlabpools where i didnt ask for them to be opened and that of course is not working.
The documentation is the File Exchange Contribution page and the code itself which is there. It works using handle objects.
There is one part of it that invokes "spmd" as part of the initialization. That suggests that you should open your parfor pool first before using the worker object wrapper.

Sign in to comment.



Translated by