필터 지우기
필터 지우기

Lagged/Cross Correlations with missing values

조회 수: 7 (최근 30일)
Tyler Smith
Tyler Smith 2017년 6월 9일
댓글: Simon de Szoeke 2020년 3월 8일
I need to compute the cross correlation (up to 7 days lag) for each column in my matrix. However, many of the cells contain missing data. The data is satellite data and the missing days mean it was too cloudy. Therefore, the cells with NaN cannot be turned into 0's, but instead must be included in the cross correlation because they are relevant information. However, the xcorr function does not include the 'pairwise' feature like the corrcoef function that omits rows with missing values. So more specifically, how do I calculation the cross correlation (xcorr) with a 7 day lag while omitting rows where an NaN is present. Here is an example of a small segment of data and the equation I am using so far.
Equation: [Cor, lag] = xcorr(A, B, 7, 'coeff');
A = [.2, .33, .4, .34, .56, NaN, .7, .9, .1, NaN]
B = [NaN, .1, .2, .3, NaN, NaN, .4, .5, .55, .34]

채택된 답변

dpb
dpb 2017년 6월 9일
isOK=isfinite(A) & isfinite(B); % both rows finite (neither NaN)
[r,lag]=xcorr(A(isOK),B(isOK),'coeff');
  댓글 수: 4
Simon de Szoeke
Simon de Szoeke 2018년 12월 19일
This doesn't assign the products to the right lags. The lags of the data input to xcorr are changed by truncating missing values with x(isOK). Here's a demonstration for the autocovariance:
t = 1:200;
x = sin(2*pi*t/20);
[a1,lag] = xcov(x,30); % the lag autocovariance
isOK = ~mod(t,2); % find the effect of every other datum being missing
a2 = xcov(x(isOK),30);
plot(lag,a1, lag,a2); xlabel('lag'); ylabel('xcov'); legend('a1','a2')
In this example, this method of ignoring the missing values doubles the frequency of the data input to xcov.
JOSE OCHOA-DE-LA-TORRE
JOSE OCHOA-DE-LA-TORRE 2019년 3월 9일
Totally correct , I agree 100% with this warning/advice.
The use of such isOK messes up the delays or spacing of samples, the only possible lag that might be possibel rigt is lag=0.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Reza Sameni
Reza Sameni 2020년 3월 8일
An alternative method that does not change the data length or the time-series lags is to replace the NaNs with random variables to avoid them influencing the true cross-correlations:
nanA = find(isnan(A));
nanB = find(isnan(B));
A(nanA) = randn(1, length(nanA));
B(nanB) = randn(1, length(nanB));
  댓글 수: 1
Simon de Szoeke
Simon de Szoeke 2020년 3월 8일
This doesn’t change the lags, but introducing random (theoretically uncorrelated) data does influence the covariances .

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Correlation and Convolution에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by