MATLAB Examples

Fuzzy Search

Fuzzy Search aka Approximate String Matching, Text Searching with Errors.

Contents

Syntax

d = fzsearch(r,p)

d = fzsearch(r,p,n)

d = fzsearch (r,p,n,case)

[ d,A] = fzsearch(r,p)

Description

Function fzsearch(r,p,n,case) finds the best or predetermined approximate matching between substrings of a string r (reference) and a string p (pattern). The Levenshtein distance is used as a measure of matching. Levenshtein distance is the minimum number of single-character substitutions, deletions and insertions required to convert string A to string B.

d = fzsearch(r) computes NaN.

d = fzsearch(r,p) computes the best matching between substrings of the reference and a pattern. d is a vector, where d(1) is a distance and d(2), d(3)... are indexes of the ends of the best matching substrings.

[ d,A] = fzsearch(r,p) computes as above and a cell array A of the best matching substrings.

d = fzsearch(r,p,n) computes the match between substrings of r and p in interval from the best match (say, m) to m + n. d is a (n + 1) cell array. Each cell contains a vector which first number is a distance m + k, k = 0 .. n and others are indexes of the ends of substrings with the same distance from p.

d = fzsearch (r,p,n,case) is the same of the previous but the case is ignored for case > 0.

d = fzsearch('','') computes d = 0.

d = fzsearch('', p) computes d(1) = numel(p) and d(2) = 0.

d = fzsearch(r,'') computes d(1) = 1 and d(i) = i-1, i = 2...numel(r)+1.

References

  1. https://en.wikipedia.org/wiki/Approximate_string_matching
  2. http://algolist.manual.ru/search/lcs/
  3. http://algolist.manual.ru/search/fsearch/k_razl.php

Examples

  • 1. Distance of the best matching, indexes of the ends of the best matcing substrings and a set of the best matching substrings
reference = '2171351273745126271432'
pattern = '2345'
[d,A]=fzsearch(reference,pattern);
fprintf('A distance of the best matching: %2.0f\n',d(1))
disp('Indexes of the ends of substrings:')
disp(d(2:end))
disp('A set of substrings:')
disp(A)
reference =
    '2171351273745126271432'
pattern =
    '2345'
A distance of the best matching:  2
Indexes of the ends of substrings:
     6    13
A set of substrings:
    '135'
    '35'
    '273745'
    '73745'
    '3745'
    '745'
    '45'
  • 2. Indexes of the ends of the best matching substrings and indexes of the ends of substrings which distance from the pattern more than the best one by 1 and 2
reference = 'ddbababceabadecdddcedeabc'
pattern = 'abcde'
n = 2;
[d,A] = fzsearch(reference,pattern,n);
d1= d{1};
for i = 1:d1(1)+n
d1 = d{i};
fprintf('A distance of the matching: %2.0f\n',d1(1))
disp('Indexes of the ends of substrings:')
disp(d1(2:end))
end
reference =
    'ddbababceabadecdddcedeabc'
pattern =
    'abcde'
Warning: Calculation of the set of substrings is possible only for 2 input
arguments 
A distance of the matching:  1
Indexes of the ends of substrings:
     9    14
A distance of the matching:  2
Indexes of the ends of substrings:
     8    10    13    15    25
A distance of the matching:  3
Indexes of the ends of substrings:
     5     6     7    11    12    16    17    20    22    24

3. Comparison of the fuzzy search for insensitive and sensitive cases

reference = 'AdaCDabcAAbbDabdEdbD'
pattern = 'abcDe'
disp('Case insensitive:')
d = fzsearch(reference,pattern,0,1);
disp(d{1})
disp('Case sensitive:')
disp(fzsearch(reference,pattern))
reference =
    'AdaCDabcAAbbDabdEdbD'
pattern =
    'abcDe'
Case insensitive:
     1    17
Case sensitive:
     2     8     9    10