# comparing two sentences of strings

조회 수: 13(최근 30일)
Michael scheinfeild 19 Feb 2015
답변: Greg Dionne 21 Oct 2016
how i want measure between sentences of words example 1) " the monkey wants to eat banana the dog want to run after cat" 2) " munkey to to eat banan the dogy wabt to rub cat"
it is result of some ocr so i need to measure 1. correct location 2. distance between words like monkey abd munkey , dog dogy
word can repeat so i want to measure relative good location what best method i was thinkink use ismemebr , or regexp ?

댓글을 달려면 로그인하십시오.

### 채택된 답변

Stephen Cobeldick 19 Feb 2015
편집: Stephen Cobeldick 23 Feb 2015
This is a very complicated topic, because it depends what you mean by "similar" words, how close they have to be to count as matches, and how you want to deal with conflicts or missing words.
The difference between any two words can be measured using a metric such as the Hamming distance, the Damerau-Levenshtein distance or some other string metric .
But matching them is much more difficult. Consider that each of your example sentences has a different number of words and some of them repeat in the second sentence: how do does the computer know to match both 'to''s (the second and third words of the second sentence) to one 'to' (the fourth word of the first sentence)? How different do the words have to be for them to be considered a match? How do we classify them if there are multiple different possible matches? Which match decides the "correct location" ?
Consider this example:
A = 'bob bib did'
B = 'bib bid dib'
How would you match these: Does 'bib' in B match 'bob' in the same position in A, or does it match 'bib' which has the same spelling? 'Does 'bid' match 'bib' or 'did', from each of which it differs by only one letter? Does another possible match affect another? How are you going to tell your computer this?

댓글을 달려면 로그인하십시오.

### 추가 답변(2개)

Michael scheinfeild 19 Feb 2015
i was thinking use dtw dynamic time warping algorithm for total distance http://en.wikipedia.org/wiki/Dynamic_time_warping
but i think is good idea for hamming i will try
##### 댓글 수: 0표시숨기기 이전 댓글 수: -1

댓글을 달려면 로그인하십시오.

Greg Dionne 21 Oct 2016
Search the internet for "edit distance". MATLAB has a variant that works on real signals, called EDR. It does insertion/deletion but not transposition (e.g. "eht" vs "the").

댓글을 달려면 로그인하십시오.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by