Info

이 질문은 마감되었습니다. 편집하거나 답변을 올리려면 질문을 다시 여십시오.

Ignore Deletions with Edit Distances (String Editing)

조회 수: 1 (최근 30일)
Marcel Dorer
Marcel Dorer 2016년 4월 22일
마감: MATLAB Answer Bot 2021년 8월 20일
Hi, I'm trying to compare 2 strings with a function based on Miguel Castro's EditDist.m function. The function works pretty well but in my case I need to ignore some of the Deletions, namely all in the beginning and the end of the string.
For example when I compare the 2 Strings 'XXXXMatlabXXXX' and 'YYMatlabYY' the first 2 'X' and the last 2 'X' which would be deletions shouldn't count towards the EditDistance value (which should be 4 in this case). Basically one of the 2 strings has a random number of random surrounding values that should be ignored, deletions after the first Insertion/Replacement/Correct Value should be counted normally, at least until there is only a tail of deletions left.
Help would be really appreciated!
Here is the relevant part of the function I'm using:
for i = 1:n1
D(i+1,1) = D(i,1) + DelCost;
end;
for j = 1:n2
D(1,j+1) = D(1,j) + InsCost;
end;
for i = 1:n1
for j = 1:n2
if s1(i) == s2(j)
Repl = 0;
else
Repl = ReplCost;
end;
D(i+1,j+1) = min([D(i,j)+Repl D(i+1,j)+DelCost D(i,j+1)+InsCost]);
end;
end;
d = D(n1+1,n2+1);

답변 (1개)

Arnab Sen
Arnab Sen 2016년 4월 26일
편집: Arnab Sen 2016년 4월 27일
Hello Marcel,
I am assuming that between two strings s1 and s2, s1 is known to be the one which is wrapped with some redundant characters.
Now, let's dig into what is meant by D(i,j) in the script. It means that the conversion cost of s1.substring(1,i) to s2.substring(1,j) and vice verse. Now, let's assume that after kth index of s1, all the indices are redundant. So,
D(n1,n2)=D(k,n2)+(n1-k)*DelCost.
So, Now the task is simple. We need to find out the value of k. Following code snippet should do that:
i=n1;
while(D(i,n2)-D(i-1,n2)==DelCost)
{
i=i-1;
}
k=i;
So, the last (n1-k) chars are redundant in s1.
Now we need to find out the front end redundant characters in s1. For this we can create another table (say X) where
X(i,j)= The conversion cost of s1.subtring(i,n1) to s2.sunstring(j,n2) and adopt similar approach.
A simpler approach would be just reverse the string s1 (say s1')and s2 (s2') and call edit distance again and perform same workflow. Now redundant character at the end of s1' are the redundant characters in the front end of the original string s1.
At the end subtracts DelCost*(number of total redundant characters in s1) from the original output.
  댓글 수: 2
Marcel Dorer
Marcel Dorer 2016년 4월 26일
Thanks a lot for the answer, it was pretty helpful and I understand the principle. There is only 1 thing I fail to understand:
{
i--;
}
I'm no matlab expert and I have to admit that I've never seen an expression like that. If I try to use that part in matlab a bracket error occurs. I'd really appreciate if you could explain this a little more!
Arnab Sen
Arnab Sen 2016년 4월 26일
편집: Arnab Sen 2016년 4월 26일
Hi,
You are correct. MATLAB does not recognize i--. It's common in languages like C, C++, Java. Please consider the expression as
{
i=i-1;
}
I have edited the original answer as well accordingly. Thanks for pointing this out.
Please accept the answer if this helps.

이 질문은 마감되었습니다.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by