Parfor loops indexing into table rows

조회 수: 33 (최근 30일)
Andrew McCauley
Andrew McCauley 2022년 7월 20일
댓글: Bruno Luong 2022년 7월 20일
Typically the most time-consuming part of my data analysis can be boiled down to "do thing to row of table for all rows of table", so it seemed pretty ideal for parfor looping (and is) but I'm wondering if there is a better way than the workaround I've been using.
Indexing seems problematic - my usual approach to table indexing is table.columnname(row) but this leads to an error: "Error: Unable to classify the variable 'tableParfor' in the body of the parfor-loop. For more information, see Parallel for Loops in MATLAB, "Solve Variable Classification Issues in parfor-Loops"."
The same thing happens if I try table{row, columnname}, and as far as I can tell from the docs on tables I'm kinda out of options for normal indexing at this point.
I assumed my usual approach failed because this page says that indexing in the form of a.b(c) fails:
Variable A on the left is not sliced; variable A on the right is sliced:
A.q(i,12) A(i,12).q
But the right side indexing is not valid for tables. I'm not really sure why table{row, column} doesn't work. But I did find a workaround (making a temporary one-row table and always indexing into that) that does work but seems suboptimal. Still cuts down on time for a lot of my scripts but I still think
If anyone can shed some light or improve this code, I've made a simplified version of what my actual scripts generally do with parfor loops.
tableParfor = table('Size', [100 4], 'VariableTypes', {'double', 'double', 'double', 'double'}, 'VariableNames', {'first', 'second', 'third', 'final'});
for rows = 1:100
for columns = 1:3
tableParfor.(columns)(rows) = rand(1);
end
end
a=1.5;
b=2.6;
c=6.4;
%random broadcast variables
parfor cT = 1:height(tableParfor)
% tableParfor.final(cT)=a*tableParfor.first(cT) + b*tableParfor.second(cT) + c*tableParfor.third(cT);
% my usual syntax, this doesn't work with parfor
% tableParfor{cT, 'final'}=a*tableParfor{cT, 'first'} + b*tableParfor{cT, 'second'} + c*tableParfor{cT, 'third'};
% alternative syntax, this doesn't work with parfor
% tableParfor(cT).final=a*tableParfor(cT).first + b*tableParfor(cT).second + c*tableParfor(cT).third;
% my attempt to get something like what the docs recommend, but is invalid syntax for tables
rowTable = tableParfor(cT, :);
rowTable.final = a*rowTable.first + b*rowTable.second + c*rowTable.third;
tableParfor(cT, :) = rowTable;
% this workaround works, but adds two extra lines to the code and I think the extra creation of rowTable for each worker chews up memory
end

채택된 답변

Edric Ellis
Edric Ellis 2022년 7월 20일
There's a few things conspiring against you here. Firstly, parfor analysis doesn't understand how to "slice" table data using variable names, but you can use variable indices, i.e. tableParfor{cT,4} = ... is allowed.
Secondly, you're trying to use tableParfor as a "sliced input/output", which further constrains what you're allowed to do - in particular the "fixed form of indexing" constraint stops you accessing different variables of your sliced row directly.
Your workaround (extract a slice, operate, put it back) would be my first choice, despite its awkwardness. The following is almost certainly a worse option since it duplicates and then broadcasts the input data table, but it does work:
inTable = tableParfor;
parfor cT = 1:height(tableParfor)
tableParfor{cT, 4}=a*inTable{cT, 'first'} + b*inTable{cT, 'second'} + c*inTable{cT, 'third'};
end
Note that in that example, inTable gets broadcast, and so all indexing restrictions are removed, and I can use the variable-name indexing.
  댓글 수: 4
Andrew McCauley
Andrew McCauley 2022년 7월 20일
Each row will at least have an entry in a column that holds cells of a vector of event timings, often around 1000x1. Often each row will have a vector of the rate of event timings over time, which can be as much as 200000x1 if I don't downsample it, but at least some thousands x 1 even if I do. 1673 bytes per row overhead is not really a concern.
Bruno Luong
Bruno Luong 2022년 7월 20일
OK I see now that looks big.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Bruno Luong
Bruno Luong 2022년 7월 20일
편집: Bruno Luong 2022년 7월 20일
This works, but I'm not sure is what you want.
IMO table is not well-suited data structure to do calculation. Simple raw numerical array is.
EDIT corrrect typos
tableParfor = table('Size', [100 4], 'VariableTypes', {'double', 'double', 'double', 'double'}, 'VariableNames', {'first', 'second', 'third', 'final'});
for rows = 1:100
for columns = 1:3
tableParfor.(columns)(rows) = rand(1);
end
end
a=1.5;
b=2.6;
c=6.4;
%random broadcast variables
for cT = 1:height(tableParfor)
rowTable = tableParfor{cT, :};
rowTable(4) = a*rowTable(1) + b*rowTable(2) + c*rowTable(3);
tableParfor(cT,:) = num2cell(rowTable);
% this workaround works, but adds two extra lines to the code and I think the extra creation of rowTable for each worker chews up memory
end
  댓글 수: 2
Andrew McCauley
Andrew McCauley 2022년 7월 20일
편집: Andrew McCauley 2022년 7월 20일
Thanks Bruno - I don't really see how that improves my existing workaround, and I think you've made a typo with "rowFinal(4) = a*rowTable(1) + b*rowTable(2) + c*rowTable(3);", your solution replaces the existing columns with zeros (which would be bad), and also I assume you mean that tables are not well-suited to calculation.
I agree, but my data requires each row to have strings, doubles and cells (among other things, for names of cells recorded from, some constant related to the recording, and raw data traces respectively), so raw numerical array is not possible - cell array works fine, but I don't think would be any better and is much more cumbersome for indexing.
Typically within a loop, I do curve fits, several functions etc etc all on the contents of one row of a table (which may be doubles or cells), and if it's time-consuming enough to justify the overhead, I'll turn that into a parfor loop. Basically, I'm hoping I can find a way to not have to create a temporary row and just slice into the row of the table I'm operating on for individual elements.
Bruno Luong
Bruno Luong 2022년 7월 20일
The table is a beast of OOP class with all kinds of overloaed indexing. You are already lucky to be able to allow to extract rows in parallel as slice data.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Parallel for-Loops (parfor)에 대해 자세히 알아보기

제품


릴리스

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by