Finding cell array row indices based on numeric column values

조회 수: 9 (최근 30일)
Piddy
Piddy 2018년 1월 9일
댓글: Piddy 2018년 1월 10일
I have a large cell array keystrokes of approximate size 20000x4. Columns 1 and 3 each contain a char, while columns 2 and 4 each contain a double. For example:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{'l' } {[ 180]} {'e' } {[ 69]}
{'e' } {[300664]} {'|space|'} {[ 125]}
{'|space|'} {[ 62]} {'n' } {[2500]}
I want to find the row indices in keystrokes of occurrences of every unique combination of columns 1 and 3, where the value in column 2 is less than 100000 and the value in column 4 is less than 2000. My current code gives me the error "Undefined operator '<' for input arguments of type 'cell'.", and is shown below.
% Temporarily convert keystroke structure to a table due to unique() apparently not supporting combinations of cellarray columns.
uniqueDigraphsTable = unique(cell2table(keystrokes(:,[1 3])), 'rows');
uniqueDigraphs = table2cell(uniqueDigraphsTable);
for ii = 1:length(uniqueDigraphs)
% Find rows containing the current unique digraph
occurrenceIndices = find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & strcmp(keystrokes(:,3),
uniqueDigraphs{ii,2}) & keystrokes(:,2)<100000 & keystrokes(:,4)<2000);
...
end
Using keystrokes{:,4}<2000 gives me this error: "Error using <. Too many input arguments." Is there a simple (and perhaps prettier) way to find the indices?
  댓글 수: 1
Jan
Jan 2018년 1월 9일
Prefer to post the input data such, that they can be used by copy&paste. Is keystrokes a nested cell:
kestrokes = { ...
{'l' } {[ 180]} {'e' } {[ 69]}; ...
{'e' } {[300664]} {'|space|'} {[ 125]}; ...
{'|space|'} {[ 62]} {'n' } {[2500]}}
or a cell:
kestrokes = { ...
'l', 180, 'e', 69; ...
'e', 300664, '|space|', 125; ...
'|space|', 62, 'n' 2500}
? Even typing this question need a lot of typing.

댓글을 달려면 로그인하십시오.

답변 (2개)

Guillaume
Guillaume 2018년 1월 9일
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
[keystrokes{:,2}] < 100000 & ...
[keystrokes{:,4}] < 2000)
or
find(strcmp(keystrokes(:,1), uniqueDigraphs{ii,1}) & ... split over several lines for readability
strcmp(keystrokes(:,3), uniqueDigraphs{ii,2}) & ...
cell2mat(keystrokes(:,2)) < 100000 & ...
cell2mat(keystrokes(:,4)) < 2000)
In essence you have to transform your cell columns into numeric matrices.
  댓글 수: 1
Piddy
Piddy 2018년 1월 10일
Thanks a lot! Your cell2mat solution gives the results I'm looking for. The first solution seems to have sort of looping problem though. It produces a very large vector where the first elements are the correct indices, but following those are indices that exceed the length of the keystrokes array.
For example, when keystrokes is a 24894x4 cell, part of its output for a specific row in uniqueDigraph looks like this:
K>> length(occurrenceIndices)
ans =
158473
K>> occurrenceIndices(1:15)
ans =
591
677
1090
2247
2578
2912
3227
25485
25571
25984
27141
27472
27806
28121
50379
The first 7 values are correct, but the rest are too large. 24894 + 591 = 25485 though, and 24894 + 677 = 25571 etc.

댓글을 달려면 로그인하십시오.


Jan
Jan 2018년 1월 9일
편집: Jan 2018년 1월 9일
The cell is not useful for these comparisons. Converting is to a table is the next indirection. Easier:
% Store strings in one cell string:
Strings = keystrokes(:, [1, 3]);
uStrings = unique(Strings, 'rows');
% Store numbers in a numerical array:
Values = cell2mat(keystrokes(:, [2, 4]));
% Move the check of the values out of the loop for performance:
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(strcmp(Strings(:,1), uStrings{ii, 1}) & ...
strcmp(Strings(:,2), uStrings{ii, 2}) & ...
match);
...
end
This would be faster, if you use the 2nd and 3rd output of unique() also:
[uStrings, iString, iUniq] = unique(Strings, 'rows');
match = (Values(:, 1) < 100000 & Values(:, 2) < 2000);
for ii = 1:length(uStrings)
occurrenceIndices = find(iUniq == ii & match);
...
end
  댓글 수: 2
Piddy
Piddy 2018년 1월 10일
Thank you! There is still an issue though. The following line produces this warning: "The 'rows' input is not supported for cell array inputs."
[uStrings, iString, iUniq] = unique(Strings, 'rows');
Does this tie into your comment asking whether or not keystrokes is a nested cell? I didn't produce the keystrokes variable myself, but I'm fairly sure that it is not nested. I checked using class():
class(keystrokes{1,1})
ans = 'char'
I also think that if it were nested, the example command I showed in my original question would have produced an output like this:
>> keystrokes(378:380,:)
ans =
3×4 cell array
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
{1×1 cell} {1×1 cell} {1×1 cell} {1×1 cell}
I could of course be mistaken.
Guillaume
Guillaume 2018년 1월 10일
Annoyingly, unique (and ismember) do not support the 'row' option with cell arrays even if it is a cell array of char arrays. If you have matlab R2016b or later, you can convert the cell array of char arrays into a string array which can be used with unique and the 'row' option:
unique(string(keystrokes(:, [1 3])), 'rows')

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by