MATLAB Examples

# Fuzzy Search

Fuzzy Search aka Approximate String Matching, Text Searching with Errors.

## Syntax

d = fzsearch(r,p)

d = fzsearch(r,p,n)

d = fzsearch (r,p,n,case)

[ d,A] = fzsearch(r,p)

## Description

Function fzsearch(r,p,n,case) finds the best or predetermined approximate matching between substrings of a string r (reference) and a string p (pattern). The Levenshtein distance is used as a measure of matching. Levenshtein distance is the minimum number of single-character substitutions, deletions and insertions required to convert string A to string B.

d = fzsearch(r) computes NaN.

d = fzsearch(r,p) computes the best matching between substrings of the reference and a pattern. d is a vector, where d(1) is a distance and d(2), d(3)... are indexes of the ends of the best matching substrings.

[ d,A] = fzsearch(r,p) computes as above and a cell array A of the best matching substrings.

d = fzsearch(r,p,n) computes the match between substrings of r and p in interval from the best match (say, m) to m + n. d is a (n + 1) cell array. Each cell contains a vector which first number is a distance m + k, k = 0 .. n and others are indexes of the ends of substrings with the same distance from p.

d = fzsearch (r,p,n,case) is the same of the previous but the case is ignored for case > 0.

d = fzsearch('','') computes d = 0.

d = fzsearch('', p) computes d(1) = numel(p) and d(2) = 0.

d = fzsearch(r,'') computes d(1) = 1 and d(i) = i-1, i = 2...numel(r)+1.

## References

1. https://en.wikipedia.org/wiki/Approximate_string_matching
2. http://algolist.manual.ru/search/lcs/
3. http://algolist.manual.ru/search/fsearch/k_razl.php

## Examples

• 1. Distance of the best matching, indexes of the ends of the best matcing substrings and a set of the best matching substrings
```reference = '2171351273745126271432' pattern = '2345' [d,A]=fzsearch(reference,pattern); fprintf('A distance of the best matching: %2.0f\n',d(1)) disp('Indexes of the ends of substrings:') disp(d(2:end)) disp('A set of substrings:') disp(A) ```
```reference = '2171351273745126271432' pattern = '2345' A distance of the best matching: 2 Indexes of the ends of substrings: 6 13 A set of substrings: '135' '35' '273745' '73745' '3745' '745' '45' ```
• 2. Indexes of the ends of the best matching substrings and indexes of the ends of substrings which distance from the pattern more than the best one by 1 and 2
```reference = 'ddbababceabadecdddcedeabc' pattern = 'abcde' n = 2; [d,A] = fzsearch(reference,pattern,n); d1= d{1}; for i = 1:d1(1)+n d1 = d{i}; fprintf('A distance of the matching: %2.0f\n',d1(1)) disp('Indexes of the ends of substrings:') disp(d1(2:end)) end ```
```reference = 'ddbababceabadecdddcedeabc' pattern = 'abcde' Warning: Calculation of the set of substrings is possible only for 2 input arguments A distance of the matching: 1 Indexes of the ends of substrings: 9 14 A distance of the matching: 2 Indexes of the ends of substrings: 8 10 13 15 25 A distance of the matching: 3 Indexes of the ends of substrings: 5 6 7 11 12 16 17 20 22 24 ```

3. Comparison of the fuzzy search for insensitive and sensitive cases

```reference = 'AdaCDabcAAbbDabdEdbD' pattern = 'abcDe' disp('Case insensitive:') d = fzsearch(reference,pattern,0,1); disp(d{1}) disp('Case sensitive:') disp(fzsearch(reference,pattern)) ```
```reference = 'AdaCDabcAAbbDabdEdbD' pattern = 'abcDe' Case insensitive: 1 17 Case sensitive: 2 8 9 10 ```