Correlation between two differently formatted datasets

조회 수: 14 (최근 30일)
Melissa
Melissa 2015년 3월 7일
답변: Chad Greene 2015년 3월 9일
Hello,
I want to calculate the R^2 correlation between two different datasets.
The first one, A, is 192x288 (lat,lon) and I can visualize the values on a 2D colormap
The second one, B, is 555x2 (lat,lon) This data was from an excel file, in column format. The data is randomly spread throughout the globe, and do not lie on the same grid cells of A. The data is far too sparse to be able to interpolate.
I am having trouble figuring out how I can possibly find a correlation between these two different data formats. Is there a way to convert B into a map that I can visualize with a colormap like A? Also, how would the resolution affect this calculation?
Any help would be highly appreciated
Thank you,
Melissa

채택된 답변

Chad Greene
Chad Greene 2015년 3월 9일
Melissa,
Without knowing anything about your project, my gut feeling is that it does not seem prudent to grid your B dataset because you'll end up interpolating over long, long distances between data points. I suppose you could use triscatteredinterp or gridfit, but you'd probably want to then mask out any grid boxes that are far away from the B data points.
You can, however, get a correlation between these data sets. I'm going to make up a gridded dataset A and a point dataset B:
% Some gridded dataset A:
[lonA,latA] = meshgrid(-180:2:180,90:-1:-90);
A = peaks(181)+.1*latA;
% Some measurements B at specific points:
latB = 180*(rand(30,1)-.5);
lonB = 360*(rand(30,1)-.5);
B = .1*latB+rand(size(latB));
% Plot the points atop the gridded dataset:
pcolor(lonA,latA,A)
hold on
plot(lonB,latB,'rp','markersize',15)
shading interp
xlabel('longitude')
ylabel('latitude')
Then get A values at points B by interpolating the A dataset:
A_interp = interp2(lonA,latA,A,lonB,latB);
You can then use corrcoef to get a correlation coefficient, which for this fake data is 0.89:
R = corrcoef([A_interp B])
R =
1.0000 0.8936
0.8936 1.0000
But note that correlation coefficient depends a bit on data means and scaling. Below I'm going to use polyplot to plot the linear regression:
plot(A_interp,B,'b*')
hold on
polyplot(A_interp,B,'k-')
axis tight; box off
xlabel('dataset A')
ylabel('dataset B')

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by