Compare strings of different size/length

Question

Michele Rizzato 2020년 12월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/678173-compare-strings-of-different-size-length

댓글: Stephen23 2020년 12월 4일

I'm getting a huge headache in coding a procedure to determine similarities between two strings and so the index of the best matching into a more than 10,000 elements cell.

the i-th element of the first cell matrix is something like:

str1= 'Class music n. 12 160b'

which is the element I want to search into the other matrix. The correspondant matching element of the second matrix, e.g., is:

str2= 'Classical musical n. 12 160beats'

and so on.

I wish to find a procedure to distinguish whether this couple is the most similar with respect to all the others (others can be like

str3 = 'Techno music n. 7 120beats' 
str4 = 'Rock disco n. 12 140beats'
str5 = 'Punk metal n. 18 180 beats'

or even more different).

I wish to find the index in the cell matrix where

str2

variable is, in order to manipulate it.

I've been trying several approaches, but with none of them I achieved consistent results.

Would you be able to assist me in this?

Thank you

M

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Michele Rizzato 2020년 12월 4일

Even if would be a long process to indentify all the abbreviations, then it will be easier to calculate the edit distance, true.

But my problem is that edit distance works with positional matching so i'd need a procedure that can identity the matching words even when put randomly into the string.

Is that possible?

Michele Rizzato 2020년 12월 4일

MATLAB Online에서 열기

I've also tried to use FPAT for a fuzzy approach, but strangely what i get with

fpat(str1,str2)

is the following:

 struct with fields:
      magic: 'FPAT'
        ver: '25-Oct-2004 20:49:37'
       time: '04-Dec-2020 15:16:21'
    runtime: 0.0053
        par: [1×1 struct]
       mode: 'ALL patterns'
       npat: 0

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Stephen23 2020년 12월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/678173-compare-strings-of-different-size-length#answer_565168

MATLAB Online에서 열기

in1 = 'Class music n. 12 160b';
in2 = {'Classical musical n. 12 160beats','Techno music n. 7 120beats','Rock disco n. 12 140beats','Punk metal n. 18 180 beats'};
rgx = {'([Cc])lass(\s+)','\d+b$'};
rpl = {   '$1lassical$2','$&eats'};
tm1 = regexprep(in1,rgx,rpl);
tm2 = regexprep(in2,rgx,rpl);
edd = editDistance(tm1,tm2)
edd = 1×4
     2    12    13    16
●
[~,idx] = min(edd);
in2{idx}
ans = 'Classical musical n. 12 160beats'

댓글 수: 2
없음 표시없음 숨기기

Michele Rizzato 2020년 12월 4일

편집: Michele Rizzato 2020년 12월 4일

MATLAB Online에서 열기

Thank you Stephen, this works great in the particular case, but it's not a solution i can apply to the general case. If next time i want to search for:

'Techno music n. 7 120beats'

i should write a different code

Stephen23 2020년 12월 4일

MATLAB Online에서 열기

"i should write a different code"

No, that is not the idea at all: there should be just one list of all abbreviations and their replacements (this assumes that you have this prior knowledge) which you can apply to all strings. What I showed is just a demonstration using your example data, but you will need to complete it with all abbreviations. You can then use the same code for any string that you want to match.

If the order of the words can be "random" as you wrote, then first replace the abbreviations, split the words, sort the words alphabetically (or alphanumerically), join the words, and finally measure the edit distance:

in1 = 'Class music n. 12 160b'; % string you want to match
in2 = {'Classical musical n. 12 160beats','Techno music n. 7 120beats','Rock disco n. 12 140beats','Punk metal n. 18 180 beats'};
rgx = {'([Cc])lass(\s+)', '\d+b$'};
rpl = {   '$1lassical$2','$&eats'};
fun = @(s)join(sort(split(s))); % or use NATSORT (must be downloaded)
tm1 = fun(regexprep(in1,rgx,rpl));
tm2 = cellfun(fun,regexprep(in2,rgx,rpl));
edd = editDistance(tm1,tm2)
edd = 1×4
     2    12    13    18
●
[~,idx] = min(edd);
in2{idx}
ans = 'Classical musical n. 12 160beats'

댓글을 달려면 로그인하십시오.

Answer 2

Sibi 2020년 12월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/678173-compare-strings-of-different-size-length#answer_565248

편집: Sibi 2020년 12월 4일

cpstr.mlx

try this,

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Sibi 2020년 12월 4일

R='T m n. 7 120b';

code will work for this one also.

Michele Rizzato 2020년 12월 4일

MATLAB Online에서 열기

Looks like it works great but if I start from

R='Techno alpha music n. 7 120beats'

it says that index exceed the number of array elements (6)

댓글을 달려면 로그인하십시오.

Compare strings of different size/length

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (2개)

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Compare strings of different size/length

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

답변 (2개)

댓글 수: 2 없음 표시없음 숨기기

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기