Error using gop -> Error detected on worker N -> Error during serialization
조회 수: 2 (최근 30일)
이전 댓글 표시
I'm receiving an error running distributed code on a cluster;
Error using gop (line 75)
Error detected on worker 5.
Error during serialization
Error in gplus (line 24)
y = gop(@plus, x, labTarget);
Error in hamiltonian>(spmd body) (line 654)
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
Error in hamiltonian>(spmd) (line 611)
spmd
Error in hamiltonian (line 611)
spmd
Error in relaxation (line 111)
[L0,Q]=hamiltonian(assume(spin_system,'labframe'));
Error in decoherence_naphthalene (line 53)
R=relaxation(spin_system);
}
The code within the spmd block is;
spmd
% Localize the problem at the nodes
partition=codistributor1d.defaultPartition(nterms);
codistrib=codistributor1d(1,partition,[nterms 1]);
local_terms=getLocalPart(codistributed((1:nterms)',codistrib));
% Preallocate the local Hamiltonian
H=mprealloc(spin_system,1);
if build_aniso
Q=cell(5,5);
for m=1:5
for k=1:5
Q{k,m}=mprealloc(spin_system,1);
end
end
end
% Build the local Hamiltonian
for n=local_terms'
% Compute operator from the specification
if descr.S(n)==0
oper=operator(spin_system,descr.opL(n),{descr.L(n)},operator_type);
else
oper=operator(spin_system,[descr.opL(n),descr.opS(n)],{descr.L(n),descr.S(n)},operator_type);
end
% Add to relevant local arrays
H=H+descr.H(n)*oper;
if build_aniso
for m=1:5
for k=1:5
if abs(descr.T(n,k)*descr.phi(n,m))>spin_system.tols.inter_cutoff
Q{k,m}=Q{k,m}+descr.T(n,k)*descr.phi(n,m)*oper;
end
end
end
end
end
% Collect the result
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
end
Matlab seems to have no problems starting and connecting to the pool of workers;
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
Starting...
Starting parallel pool (parpool) using the 'local' profile ... connected to 12 workers.
[decoherence_naphthalene > create ] Spinach root directory determined to be /home/******/kec30/spinach_1.4.2114
Some background: I'm running this through an SGE queuing system (although I get the same problem running interactively). It always returns the error with "worker N" where N < # cores in parpool.
I'm running these simulations on Matlab R2014a (8.3.0.532) 64-bit (glnxa64) using code provided in spinach 1.4.2114. It's a 10 spin system (4^n matrix elements), and the code appears to start on my laptop but quickly maxes out my ram (8 GB, about 4 available for Matlab). On the cluster, I've tried reducing the memory dramatically by going from the full system (10 spin-coherences) to a greatly reduced system (3-spin coherences, with a distance cutoff) (which is probably to few), but I still encounter the same problem. Since this seems to be more of a Matlab error than a spinach error, I thought I'd ask here for advice.
댓글 수: 2
Edric Ellis
2014년 8월 8일
What is the type of 'H'? Can it be saved to a MAT file successfully (this is required as the communication system uses the same mechanism when transferring data)? Also, I note that Q is a cell array - I would not expect gplus(Q,1) to succeed as that's trying to add together each worker's version of 'Q' (but I think that would give you a different error message).
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Parallel and Cloud에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!