Efficiently populating an array without for loops

조회 수: 45 (최근 30일)
Rachel
Rachel 2012년 5월 27일
Hi Everyone,
I have a list of data with 10,000,000 rows and 3 columns. The columns correspond to the shape, size, and color of an object, which is indexed with a number. There are 100 shapes, 100 sizes, and 50 colors.
I want to create a matrix (100x100x50) that essentially stores the count of each object type, kind of like a histogram for unique objects.
Rather than my following code, which is too slow to run because of the for-loops, does anyone know of a way to complete the same operation using direct matrix operations? It seems these comparisons should be relatively fast, but are extremely slow in Matlab the way I am doing it.
ObjectTypes = zeros(100,100,50);
for Shape=1:100
for Size=1:100
for Color=1:50
ObjectTypes(Shape,Size,Color) = size(MyData(MyData(:,1) == Shape & MyData(:,2) == Size & MyData(:,3) == Color),1);
end
end
end

채택된 답변

Geoff
Geoff 2012년 5월 27일
Hah... So an alternative in Order(N) time...
for n = 1:size(MyData,1)
row = MyData(n, [1,2,3]);
ObjectTypes(row(1),row(2),row(3)) = ObjectTypes(row(1),row(2),row(3)) + 1;
end
  댓글 수: 1
Rachel
Rachel 2012년 5월 27일
This worked very well and quickly ... thanks so much for your help!

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Geoff
Geoff 2012년 5월 27일
Yeah that's searching through your data an awful lot every time you do the == comparisons. The way I do this kind of thing when populating a matrix from database results is to have the data sorted by two variables, and then use diff and find to get the data ranges.
So start with this:
MyData = sortrows(MyData);
Grab out the begin and end index for each group of values in column one.
% Partition by shape
begin1 = [1; 1+find(diff(MyData(:,1)))];
end1 = [begin1(2:end)-1; size(MyData,1)];
Now you can combine these into a loop variable, so each time through the loop will give you a 2x1 vector containing the start and end range. You do the same thing again with column 2. Finally I use accumarray to count up all the colours for a given size and shape:
% Process the Shape partitions
for r1 = [begin1, end1]'
Shape = MyData(r1(1), 1); % Single Shape
% Partition by Size
idx1 = r1(1):r1(2);
col2 = MyData(idx, 2);
begin2 = [1; 1+find(diff(col2))];
end2 = [begin2(2:end)-1; numel(col2)];
% Process the Size partitions
for r2 = [begin2, end2]'
Size = col2(r2(1)); % Single Size
idx2 = r1(1)+r2(1):r1(1)+r2(2);
% Count up all the Color occurrences for Shape and Size
Color = MyData(idx2, 3);
colorCount = accumarray(Color, ones(numel(Color),1));
ObjectTypes(Shape, Size, 1:max(Color)) = colorCount;
end
end
I would hope this is faster than your current loop, although there are probably clever ways to use accumarray without all the looping guff I've done. Apologies if there are errors in this code. I just hacked it straight into my web browser =)
  댓글 수: 1
Geoff
Geoff 2012년 5월 27일
Made a couple of edits in the inner loop to fix a couple of obvious mistakes.

댓글을 달려면 로그인하십시오.


Walter Roberson
Walter Roberson 2012년 5월 27일
Are the numbers for the shape, size, color consecutive integers each starting from 1? If they are then the code can be reduced to
ObjectTypes = accumarray(MyData, 1);
If not then you can create the consecutive integers by using the thiree-output version of unique().
[ushape, junk, shapeidx] = unique(MyData(:,1));
[ucol, junk, colidx] = unique(MyData(:,2));
[usize, junk, sizidx] = unique(MyData(:,3));
ObjectTypes = accumarray( [shapeid(:), colidx(:), sizidx(:)], 1);

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by