Find words common across multiple string cells

Question

Tejas 2020년 10월 26일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/626773-find-words-common-across-multiple-string-cells

댓글: Tejas 2020년 10월 27일

I have a cell array where each cell has a string of different length, and each string is essentially a column of single words. Something like this

words{1,1} = ["sphere";"geometry";"number";"algebra";"function"];
words{1,2} = ["geometry";"equation";"nonlinear";"partial";"function"];
words{1,3} = ["number";"derivative";"function";"topology";"equation";"theory"];
words{1,4} = ["equation";"integral";"geometry";"function";"singular"];

I want to find words which are repeated at least once in a specified number of cells. That is, if I say words common in at least 4 cells, then I should get back

common_words = "function";

If I want words common in at least 3 cells, I should get back

common_words = ["geometry";"function";"equation"];

I can use intersect in a loop (however inefficient that might be) if the words are required to be common in all the cells. However, how do I go about finding intersections of a specific number of cells? As per my understanding, that would require combinations, and it would increase computation time exponentially with increasing cells. Is there an efficient way to do this or would I have to take combinations?

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Stephen23 2020년 10월 26일

Is the cell array or are the strings particularly large? Would there be any memory issues if they were concatenated or merged together?

Tejas 2020년 10월 26일

words.mat

There are 40 cells in the array, and the largest string vector is 3238x1. I can also reduce this by removing repeated words within a string vector, but I think the maximum length goes to about 3000. The mean string length across all cells is in fact around 2000, since initial cells have smaller string vectors. If it helps, I've attached the file containing these strings.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2020년 10월 27일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/626773-find-words-common-across-multiple-string-cells#answer_525358

편집: Stephen23 2020년 10월 27일

MATLAB Online에서 열기

My ancient version does not support strings, so I used cell arrays of character vectors, but I would expect that this should work for string as well. Approach: get unique words, concatenate, count using a histogram function:

words{1,1} = {'sphere';'geometry';'number';'algebra';'function'};
words{1,2} = {'geometry';'equation';'nonlinear';'partial';'function'};
words{1,3} = {'number';'derivative';'function';'topology';'equation';'theory'};
words{1,4} = {'equation';'integral';'geometry';'function';'singular'};
tmp = cellfun(@unique,words,'uni',0);
tmp = vertcat(tmp{:});
[uni,~,idx] = unique(tmp);
cnt = histc(idx,1:max(idx));
out = uni(cnt>=3)

Or as a function:

>> fun = @(n) uni(cnt>=n);
>> fun(4)
ans = 
    'function'
>> fun(3)
ans = 
    'equation'
    'function'
    'geometry'

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Tejas 2020년 10월 27일

This works for me! Thank you.

댓글을 달려면 로그인하십시오.

Find words common across multiple string cells

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Find words common across multiple string cells

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기