MATLAB Answers

cross correlation using 'xcorr' in the presence of NaN or missing values

조회 수: 91(최근 30일)
Sagar 2015년 8월 6일
댓글: Marco Sandoval Belmar 2021년 5월 22일
Hi I am trying to calculate cross correlation of two time-series at different lags but my data have a lot of NaN values. When I calculate cross correlation as below, it gives all NaNs in the corln.
[corln, lags] = xcorr (ave_precp_india (:), aod_all (:, 1), 15);
I want to specify something like 'rows', 'pairwise' in calculating correlation so that NaNs are ignored. How can I specify 'rows', 'pairwise' option in xcorr?


Adam Danz
Adam Danz 2020년 4월 15일
편집: Adam Danz 2020년 4월 15일
There isn't a simple solution to this problem. If you have a single NaN value within the window, the correlation for that window will be NaN.
Depending on how many missing values are in the data and how far they are spread apart, you may be able to work around the problem.
If there are relatively few missing values and the missing values are spread apart, you could fill in the NaN values by interpolation or using Matlab's fillmissing() function but you must do so in a responsible and meaningful way. Merely avoiding NaN values is not an indication that your solution was a good solution. After filling the missing values, plot the data and make sure the updated values make sense and are reasonable.
If the NaN values are clustered together, interpolation and fillmissing() won't be reasonable solutions. You may have to analyze the data in chunks but even that has problems since the number of data points within the window becomes smaller at the beginning and end of each chunk of data.
  댓글 수: 2
Adam Danz
Adam Danz 2020년 4월 15일
Thanks for sharing what you've found. It will likely be useful to future visitors here.

댓글을 달려면 로그인하십시오.

Marco Sandoval Belmar
Marco Sandoval Belmar 2021년 5월 22일
편집: Marco Sandoval Belmar 2021년 5월 22일
I agree with the comment above. It is not a straightforward way to deal with this. However, I have a code that calculate the normal correlation and with the 'rows','complete' option of 'corr' and then moves the time series manually. Nevertheless, I have noticed that this produces some artificial "wiggles' if you make a lag vs correlation graph, and I assume is because of the NaN's and the explanation of the above comment. So, something like:
function [R,L,pvalue] = nanxcorr_ms(s1,s2,Lag)
% function [L, R,pvalue] = nanxcorr (s1, s2, Lag);
% Function that allows obtaining the cross-correlation
% of a pair of time series containing gaps
% Input:
% s1 time series [vector]
% s2 time series [vector]
% lag number of lags to correlate (ex. 20)
% output
% L lag
% R correlation coefficient
% pvalue
% sam 04/16/2013
% Marco Sandoval Belmar 4/1/2018
[r,p]=corr(s1',s2','rows','complete'); % correlation a lag == 0
% Performs the correlation for the different lags
L = 0; R = r; pvalue=p;
for i1 =1:1:Lag
s11 = s1(1:end-i1);
s21 = s2(i1+1:end);
[c,pp] = corr(s11',s21','rows','complete');
R = [c;R];
pvalue = [pp;pvalue];
L = [-i1;L];
clear s11 s21 c pp
s21 = s2(1:end-i1);
s11 = s1(i1+1:end);
[c,pp] = corr(s11',s21','rows','complete');
R = [R;c];
pvalue = [pvalue;pp];
L = [L;i1];
clear s21 s11 c


Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by