Matlab parfor saves and loads temporary variables during execution!
조회 수: 2 (최근 30일)
이전 댓글 표시
The parallel pool implementation appears to save and load variables using effectively the normal "save" and "load" channels when making copies to be passed to the workers. This is really bad because if someone has a saveobj() implemented in a class object the state of the variable could be modified during execution and depending on the circumstances could lead to "unexplained" crashes.
Lets say I have a class object that needs a large amount of temporary data during execution. When saved I would naturally want to get rid of this temporary data. A convenient way to do this is to write a saveobj that clears the temporary data within the class. If, however, the object is saved and reloaded during execution as with parfor, there are problems.
Why is the copy not made internally?
댓글 수: 1
채택된 답변
Matt J
2018년 7월 21일
편집: Matt J
2018년 7월 21일
I don't know why authoritatively, but it seems to me that this is only a danger if you are unaware of this behavior on the part of parfor. If you are aware of it, you would surely write a loadobj() method to restore the temporary data when reloaded. And, this would be better than broadcasting copies of the temporary data to all the workers. The latter is done serially and would take a lot more time.
Incidentally, have you verified that saveobj() is triggered in this scenario just as in an ordinary call to save()?
댓글 수: 7
Matt J
2018년 7월 28일
편집: Matt J
2018년 7월 29일
I don't think throwing away temporary information is straying too far from the original purpose of the save() method, as long as it is documented in the class.
Except that you want your class to work with parallel pools. Parallel pools, and maybe various other things in Matlab, expect load() to be the 1-to-1 inverse of save(). The burden is on you to meet those interface requirements. They will not conform to your class documentation.
In some types of class design, save() throws away state information and it can't be recovered. Actually that type of design is forced by the Matlab ClassificationXXX classes and if one wants to wrap them there is no alternative to throwing away unrecoverable state information from the class object.
Well, no, the alternative is to not call compact() in your saveobj method and to endure the additional consumption of disk space that this will bring about. Then you will have no conflict with parallel pool or other Matlab toolboxes.
I can appreciate, however, that storing multiple copies of the same data to disk is unappealing. But there are still options if you want to avoid that, and which don't involve breaking the 1-1 correspondence of save/load. One way is to store objects sharing common data together in arrays. Then your saveobj/loadobj pair can do things like this,
classdef myclass
methods
function s=saveobj(objArray)
s.shareddata=objArray(1).data;
[objArray.data]=deal([]);
s.objArray=objArray;
end
end
methods (Static)
function objArray=loadobj(s)
objArray=s.objArray;
[objArray.data]=deal(s.shareddata);
end
end
end
Unfortunately, it doesn't look like ClassificationXXX classes let you do this, but that might be what ClassificationEnsemble classes are intended for.
추가 답변 (1개)
Edric Ellis
2018년 7월 23일
The workers executing the body of any parfor loop are separate MATLAB processes, so the only reliable way that variables existing on the client can be sent to those workers is by doing something equivalent to calling save on the client, and then load on the workers. (The same procedure is used, but no files are created on disk).
I must admit, it's not clear to me what the benefit is of writing a saveobj that cannot be reversed by a loadobj. (Also, have you considered using Transient fields in your class?)
참고 항목
카테고리
Help Center 및 File Exchange에서 Parallel for-Loops (parfor)에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!