similarityDistance

Compute distance profile between query and time series subsequences by evaluating z-normalized Euclidean distances

Since R2024b

Syntax

D = similarityDistance(X,Y)

[D,I] = similarityDistance(X,Y)

[___] = similarityDistance(___,EndPoints=outputLengths)

Description

D = similarityDistance(X,Y) returns the vector of z-normalized Euclidean distances between the query sequence Y and the every subsequence of the time series X that has the same length as Y.

example

[D,I] = similarityDistance(X,Y) also returns the vector I of the starting indices of the subsequences that best match the query in Y.

example

[___] = similarityDistance(___,EndPoints=outputLengths) specifies how to handle the length of the output vectors when X ends with a partial subsequence. The choices are "discard", which discards the elements corresponding to the partial subsequence, and "fill", which pads the output vectors with additional NaN values to match the length of X if X actually does end with a full subsequence.

Use this syntax with any of the input and output arguments in the previous syntaxes.

Examples

collapse all

Compute and Evaluate Similarity Distance Profile

Open Live Script

Load the data, which consists of T1 and T2. T1 is a timetable containing armature current measurements on a degrading DC motor. T2 is a timetable that contains data collected from a known faulty motor.

load matrix_profile_data T1 T2

Set x to the MotorCurrent variable in T1. Plot x in a subplot.

x = T1.MotorCurrent;

subplot(211)
plot(x)
ylabel('Motor Current, mA')
hold on

Figure contains an axes object. The axes object with ylabel Motor Current, mA contains an object of type line.

T2 contains anomalous data in a segment that begins at location 3000 and has a length of 100. Extract this data as the target segment y.

len = 100;
loc = 3000;
I = loc:loc+len-1;
y = T2.MotorCurrent(I);

Compute the similarity distance of the target anomaly segment y to the subsequences within the motor data in x.

[d,i] = similarityDistance(x,y);

Using the first three indices in i, plot the top three closest matching subsequences. These matches indicate potentially similar anomalies to the anomaly in y.

for k = 1:3
  Id = i(k):i(k)+len-1;
  plot(Id,x(Id),'--');
  hold on
end
legend({'Time Series', 'Match 1', 'Match 2', 'Match 3'})
hold off

Figure contains an axes object. The axes object with ylabel Motor Current, mA contains 4 objects of type line. These objects represent Time Series, Match 1, Match 2, Match 3.

For comparison, plot the target anomaly sequence.

subplot(212)
plot(y);
hold on
ylabel('Motor Current, mA')

Figure contains 2 axes objects. Axes object 1 with ylabel Motor Current, mA contains 4 objects of type line. These objects represent Time Series, Match 1, Match 2, Match 3. Axes object 2 with ylabel Motor Current, mA contains an object of type line.

Plot the data in the three matching subsequences with the target anomaly.

for k = 1:3
  Id = i(k):i(k)+len-1;
  plot(x(Id),'--');
  hold on
end
legend({'Target Anomaly', 'Match 1', 'Match 2', 'Match 3'})
hold off

The matched subsequences appear similar to the target anomaly.

Input Arguments

collapse all

`X` — Time series to evaluate
numeric vector

Time series X to evaluate , specified as a numeric vector of length N. X must not have any missing data.

`Y` — Query sequence
numeric vector

Query sequence Y, specified as a numeric vector of length M, where M is less than or equal to the length N of the time series X. Y must not have any missing data.

`outputLengths` — Options for controlling output length
`"discard"` (default) | `"fill"`

Option for controlling output length when X ends with a partial subsequence, specified as one of the following options:

"discard" — Truncate the length of the output vectors D and I to N-M+1, where N is the length of X and M is the length of Y.
"fill" — Extend the length of D and I to N by padding D with M-1 NaNs. The software sets the last M-1 elements of the vector I to the sequence N-M+2:N.

Output Arguments

collapse all

`D` — Distance vector
numeric vector

Distance vector containing the z-normalized distances between the query sequence Y and the subsequences X(k:k+M-1), where k varies from 1 to N-M+1, returned as a numeric vector of length N-M+1.

`I` — Vector of starting indices for best matching subsequences
positive integer vector

Vector of starting indices for subsequences of D that best match Y, returned as an integer vector with the same size as D.

I is ordered to sort D(I) in ascending order of distances, that is, from the best match (smallest distance) to the worst match (largest distance). The best match therefore has the starting location of D(I(1)) and the worst match has the starting location of D(I(N-M-1).

References

[1] Abdullah Mueen, Sheng Zhong, Yan Zhu, Michael Yeh, Kaveh Kamgar, Krishnamurthy Viswanathan, Chetan Kumar Gupta, and Eamonn Keogh, The Fastest Similarity Search Algorithm for Time Series Subsequences under Euclidean Distance, 2022. https://www.cs.unm.edu/%7Emueen/FastestSimilaritySearch.html

Version History

Introduced in R2024b

similarityDistance

Syntax

Description

Examples

Compute and Evaluate Similarity Distance Profile

Input Arguments

X — Time series to evaluate numeric vector

Y — Query sequence numeric vector

outputLengths — Options for controlling output length "discard" (default) | "fill"

Output Arguments

D — Distance vector numeric vector

I — Vector of starting indices for best matching subsequences positive integer vector

References

Version History

See Also

`X` — Time series to evaluate
numeric vector

`Y` — Query sequence
numeric vector

`outputLengths` — Options for controlling output length
`"discard"` (default) | `"fill"`

`D` — Distance vector
numeric vector

`I` — Vector of starting indices for best matching subsequences
positive integer vector