Residual Analysis with Autocorrelation

Open Live Script

This example shows how to use autocorrelation with a confidence interval to analyze the residuals of a least-squares fit to noisy data. The residuals are the differences between the fitted model and the data. In a signal-plus-white noise model, if you have a good fit for the signal, the residuals should be white noise.

Create a noisy data set consisting of a 1st-order polynomial (straight line) in additive white Gaussian noise. The additive noise is a sequence of uncorrelated random variables following a N(0,1) distribution. This means that all the random variables have mean zero and unit variance. Set the random number generator to the default settings for reproducible results.

x = -3:0.01:3;
rng default
y = 2*x+randn(size(x));
plot(x,y)

Figure contains an axes object. The axes object contains an object of type line.

Use polyfit to find the least-squares line for the noisy data. Plot the original data along with the least-squares fit.

coeffs = polyfit(x,y,1);
yfit = coeffs(2)+coeffs(1)*x;

plot(x,y)
hold on
plot(x,yfit,'linewidth',2)

Figure contains an axes object. The axes object contains 2 objects of type line.

Find the residuals. Obtain the autocorrelation sequence of the residuals to lag 50.

residuals = y - yfit;
[xc,lags] = xcorr(residuals,50,'coeff');

When you inspect the autocorrelation sequence, you want to determine whether or not there is evidence of autocorrelation. In other words, you want to determine whether the sample autocorrelation sequence looks like the autocorrelation sequence of white noise. If the autocorrelation sequence of the residuals looks like the autocorrelation of a white noise process, you are confident that none of the signal has escaped your fit and ended up in the residuals. In this example, use a 99%-confidence interval. To construct the confidence interval, you need to know the distribution of the sample autocorrelation values. You also need to find the critical values on the appropriate distribution between which lie 0.99 of the probability. Because the distribution in this case is Gaussian, you can use complementary inverse error function, erfcinv. The relationship between this function and the inverse of the Gaussian cumulative distribution function is described on the reference page for erfcinv.

Find the critical value for the 99%-confidence interval. Use the critical value to construct the lower and upper confidence bounds.

conf99 = sqrt(2)*erfcinv(2*.01/2);
lconf = -conf99/sqrt(length(x));
upconf = conf99/sqrt(length(x));

Plot the autocorrelation sequence along with the 99%-confidence intervals.

figure

stem(lags,xc,'filled')
ylim([lconf-0.03 1.05])
hold on
plot(lags,lconf*ones(size(lags)),'r','linewidth',2)
plot(lags,upconf*ones(size(lags)),'r','linewidth',2)
title('Sample Autocorrelation with 99% Confidence Intervals')

Figure contains an axes object. The axes object with title Sample Autocorrelation with 99% Confidence Intervals contains 3 objects of type stem, line.

Except at zero lag, the sample autocorrelation values lie within the 99%-confidence bounds for the autocorrelation of a white noise sequence. From this, you can conclude that the residuals are white noise. More specifically, you cannot reject that the residuals are a realization of a white noise process.

Create a signal consisting of a sine wave plus noise. The data are sampled at 1 kHz. The frequency of the sine wave is 100 Hz. Set the random number generator to the default settings for reproducible results.

Fs = 1000;
t = 0:1/Fs:1-1/Fs;
rng default
x = cos(2*pi*100*t)+randn(size(t));

Use the discrete Fourier transform (DFT) to obtain the least-squares fit to the sine wave at 100 Hz. The least-squares estimate of the amplitude is 2 / N times the DFT coefficient corresponding to 100 Hz, where N is the length of the signal. The real part is the amplitude of a cosine at 100 Hz and the imaginary part is the amplitude of a sine at 100 Hz. The least-squares fit is the sum of the cosine and sine with the correct amplitude. In this example, DFT bin 101 corresponds to 100 Hz.

xdft = fft(x);
ampest = 2/length(x)*xdft(101);
xfit = real(ampest)*cos(2*pi*100*t)+imag(ampest)*sin(2*pi*100*t);

figure

plot(t,x)
hold on
plot(t,xfit,'linewidth',2)
axis([0 0.30 -4 4])
xlabel('Seconds')
ylabel('Amplitude')

Figure contains an axes object. The axes object with xlabel Seconds, ylabel Amplitude contains 2 objects of type line.

Find the residuals and determine the sample autocorrelation sequence to lag 50.

residuals = x-xfit;
[xc,lags] = xcorr(residuals,50,'coeff');

Plot the autocorrelation sequence with the 99%-confidence intervals.

figure

stem(lags,xc,'filled')
ylim([lconf-0.03 1.05])
hold on
plot(lags,lconf*ones(size(lags)),'r','linewidth',2)
plot(lags,upconf*ones(size(lags)),'r','linewidth',2)
title('Sample Autocorrelation with 99% Confidence Intervals')

Figure contains an axes object. The axes object with title Sample Autocorrelation with 99% Confidence Intervals contains 3 objects of type stem, line.

Again, you see that except at zero lag, the sample autocorrelation values lie within the 99%-confidence bounds for the autocorrelation of a white noise sequence. From this, you can conclude that the residuals are white noise. More specifically, you cannot reject that the residuals are a realization of a white noise process.

Finally, add another sine wave with a frequency of 200 Hz and an amplitude of 3/4. Fit only the sine wave at 100 Hz and find the sample autocorrelation of the residuals.

x = x+3/4*sin(2*pi*200*t);
xdft = fft(x);
ampest = 2/length(x)*xdft(101);
xfit = real(ampest)*cos(2*pi*100*t)+imag(ampest)*sin(2*pi*100*t);
residuals = x-xfit;
[xc,lags] = xcorr(residuals,50,'coeff');

Plot the sample autocorrelation along with the 99%-confidence intervals.

figure

stem(lags,xc,'filled')
ylim([lconf-0.12 1.05])
hold on
plot(lags,lconf*ones(size(lags)),'r','linewidth',2)
plot(lags,upconf*ones(size(lags)),'r','linewidth',2)
title('Sample Autocorrelation with 99% Confidence Intervals')

Figure contains an axes object. The axes object with title Sample Autocorrelation with 99% Confidence Intervals contains 3 objects of type stem, line.

In this case, the autocorrelation values clearly exceed the 99%-confidence bounds for a white noise autocorrelation at many lags. Here you can reject the hypothesis that the residuals are a white noise sequence. The implication is that the model has not accounted for all the signal and therefore the residuals consist of signal plus noise.