why innerjoin does not work in parfor?
조회 수: 12 (최근 30일)
이전 댓글 표시
While trying to use parfor, I am trying to find an error. I found that using a innerjoin (line 10-12 below) makes a problem. It is okay when I use just for-loop but it does not work with parfor. Why it causes a problem? I used innerjoin as a way of randomly sampling 'id' (one of a variable in my data) and merge it with original dataset (dta2 is here). Any idea or solution? please let me know if there is anything to be cleared here to understand.
parpool(4)
N_boot = 5;
coeff_out2 = zeros(N_boot,N_coef);
parfor i = 1:N_boot
dta2 = dta;
decisions2 = unique(dta2.decision_id);
Ndecisions2 = size(decisions2,1);
sampled_id01 = randsample(decisions2,Ndecisions2,true);
sampled_id2 = dataset2table(mat2dataset(sampled_id01));
sampled_id2.Properties.VariableNames{1} = 'decision_id';
resample_dta = innerjoin(sampled_id2,dta2,'Keys','decision_id');
resample_dta = table2array(resample_dta);
result1 = mean(resample_dta(:,1:4));
coeff_out2(i,:) = result1;
end
답변 (2개)
Edric Ellis
2018년 5월 8일
(x-post from identical question on stackoverflow)
Unfortunately, innerjoin uses the inputname function, which is causing the "transparency violation" error. There's a simple workaround, which is to wrap the call to innerjoin, like so:
innerjoinFcn = @(varargin) innerjoin(varargin{:});
parfor ...
...
resample_dta = innerjoinFcn(sampled_id2,dta2,'Keys','decision_id00');
end
댓글 수: 0
Walter Roberson
2018년 5월 5일
I can get further:
decision_id = randi([1 9], 50, 1);
d1 = randi([-10 10], 50, 1);
d2 = randi([-2 2], 50, 1);
d3 = randi([0 255], 50, 1);
dta = table(decision_id, d1, d2, d3);
N_coef = 4;
cp = gcp('nocreate');
if isempty(cp); parpool(4); end
N_boot = 5;
coeff_out2 = zeros(N_boot,N_coef);
parfor i = 1:N_boot
dta2 = dta;
decisions2 = unique(dta2.decision_id);
Ndecisions2 = size(decisions2,1);
decision_id = randsample(decisions2,Ndecisions2,true);
sampled_id2 = table(decision_id, 'VariableNames', {'decision_id'});
resample_dta = innerjoin(sampled_id2,dta2,'Keys','decision_id');
resample_dta = table2array(resample_dta);
result1 = mean(resample_dta(:,1:4));
coeff_out2(i,:) = result1;
end
This gives up on the innerjoin instead of earlier.
The conversion to table was running into problems when it was not being told variable names when the table was constructed, which could hypothetically be explained if the variable names themselves were not guaranteed to be the same in the workers (because the default creation of tables involves using the name of the variable being converted as the column name.)
We could hypothesize that something similar might be happening with the innerjoin.
I am not sure how to fix it yet, as I am still trying to figure out what the intention of the code is, especially in regard to what should happen when there are multiple table entries with the same key.
Or is it safe to assume that the decision_id values will be unique? If so then the call to unique would seem to be redundant ?
댓글 수: 3
Walter Roberson
2018년 5월 5일
Right but to do this efficiently I need to know if decision_id is unique in dta or not, and if it is not then what the meaning of sampling with it should be.
참고 항목
카테고리
Help Center 및 File Exchange에서 Parallel Computing Fundamentals에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!