How can I speed up this indexing code?

조회 수: 8 (최근 30일)
bhousden
bhousden 2021년 10월 28일
댓글: bhousden 2021년 10월 29일
I have a cell array in which each cell contains a string. E.g....
a={'AAAAAA';'BBBBBB';'CCCCCC';'AAAAAA';'DDDDDD'};
Each cell in array a is associated with a row in a numerical array that contains 10 columns. E.g....
b=[0,0,0,1,1,0,0,0,0,0;
0,1,0,0,0,0,0,0,0,0;
1,0,1,0,1,0,2,0,0,0;
3,0,0,0,0,0,0,0,0,1;
0,0,0,0,0,0,2,0,0,1];
Some strings in a are repeated such as 'AAAAAA' as shown above. What I need to do is find all repeated cells in a and sum the assocated columns from b into a single row. This should result in two new arrays (unia and bnew) which have equal numbers of rows but every string in unia is unique.
Easy enough to do with a loop such as:
unia=unique(a);
bnew=zeros(numel(unia),10);
for n=1:numel(unia)
pos=find(strcmp(a,unia{n}));
bnew(n,:)=sum(b(pos,:),1);
end
This works fine for small arrays but I have a case where a has 6 million cells and unia has 300,000 cells so I need something much faster. Any ideas?
Thanks!

채택된 답변

Ive J
Ive J 2021년 10월 28일
Avoid comparing strings within the loop and instead take advantage of the index vector from unique:
a = ["A", "B2", "A", "C", "AA", "B2", "B2"]; % use strings instead of cell array of characters, they're much more efficinet to work with
b = randi([0 2], numel(a), 3)
b = 7×3
1 0 1 0 1 2 0 1 2 0 2 1 2 0 2 2 2 0 0 2 1
[anew, ~, idx] = unique(a);
bnew = arrayfun(@(x) sum(b(x == idx, :), 1), 1:numel(anew), 'uni', false);
bnew = vertcat(bnew{:})
bnew = 4×3
1 1 3 2 0 2 2 5 3 0 2 1
anew
anew = 1×4 string array
"A" "AA" "B2" "C"
Also, you can use tall arrays when dealing with large arrays.
  댓글 수: 1
bhousden
bhousden 2021년 10월 29일
Perfect! Thanks for your help.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

제품


릴리스

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by