most frequent word in cell array

조회 수: 2 (최근 30일)
Mahmoud Zeydabadinezhad
Mahmoud Zeydabadinezhad 2017년 10월 25일
답변: Sarah Palfreyman 2018년 4월 30일
Hi, I have a cell array "P" of size 2000 by 20. Each cell value is either "Yes" or "No". How can I make a new cell array "vote" of size 2000 by 1 that each cell contains the most frequent word of each row in P?

채택된 답변

Walter Roberson
Walter Roberson 2017년 10월 25일
tf = ismember(lower(P), 'yes');
votes = sum(tf, 2);
  댓글 수: 4
dpb
dpb 2017년 10월 26일
편집: dpb 2017년 10월 26일
I was just throwing in the categorical variable into the mix in the end on top of your solution for the total by row using string matching thereby "mixing the two" of the cellstr variable to the categorical variables I had suggested totally (which did the sums via countcats) by then using the categorical to display the name in English of the winner...
Wasn't imply anything at all was wrong, just adding the final step and that primarily to "show off" categorical to the OP as worth looking at.
Walter Roberson
Walter Roberson 2017년 10월 26일
Right, but I had overlooked that the question asked about the most common entry -- which can be found by testing the count against width/2

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

dpb
dpb 2017년 10월 25일
편집: dpb 2017년 10월 25일
Good place to use categorical variables instead of the cellstr...
Example:
>> yn={'yes' 'no' 'Yes';'no' 'No', 'NO'}; % minimal dataset including capitaliztion differences
>> ync=categorical(lower(yn)); % convert to categorical and normalize spelling
>> cnts=countcats(ync,2) % count responses on 2nd dimension
cnts =
1 2
3 0
>> vote=cnts(:,2)>cnts(:,1); % see which is greater (Y>N --> True)
>> vote=categorical(vote,[true false],{'Yes','No'}) % convert to categorical to display
vote =
Yes
No
>> yn % original table to compare -- looks like right choice.
yn =
'yes' 'no' 'Yes'
'no' 'No' 'NO'
>>
NB: The above doesn't have the extra logic to check for tie--in case that is possible will need to test for == as well and add the third category of TIE as possible output.
ADDENDUM
If TIE is possible, look at computing difference between counts and then the SIGN function will generate the tri-state variable needed.

Sarah Palfreyman
Sarah Palfreyman 2018년 4월 30일
Try tokenizing with Text Analytics Toolbox and you can easily get a histogram count.

카테고리

Help CenterFile Exchange에서 Categorical Arrays에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by