Why is corrcoef returning a P-value of zero?

Question

dandan 2015년 4월 22일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/213067-why-is-corrcoef-returning-a-p-value-of-zero

댓글: the cyclist 2015년 4월 22일

I have two arrays, which both contain a significant number of zeros (they're observed and predicted data for sediment transport, which is often zero).

When I run corrcoef on them as-is, it returns an r-value of 0.82 and a p-value of zero... not just 0.0000 (as in, some very small number), but an actual '0' value. This seems wacky.

When I run corrcoef but exclude the elements where both arrays are zero, I get r = 0.79 and p = 10^-14.

Does anyone know what the difference is (I assume it has something to do with the way corrcoef treats zeros), and why the first approach yields a 0 p-value? Thanks so much!!

댓글 수: 2
없음 표시없음 숨기기

Jos (10584) 2015년 4월 22일

Can you describe the inputs to corrcoef more clearly? Are these vectors, arrays? Can you give an example?

dandan 2015년 4월 22일

They are both one-dimensional double class arrays. I don't think I can really provide an example, because I think the low P-value may be at least partly due to the large size of the arrays... they're both 1x6400, and of those elements, only about 50-100 elements (depending on the array) are nonzero.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

the cyclist 2015년 4월 22일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/213067-why-is-corrcoef-returning-a-p-value-of-zero#answer_176307

편집: the cyclist 2015년 4월 22일

If the P value gets small enough, it will be reported as strictly 0, because it cannot be reported as, say, P = 10^(-560). It seems likely that you do not need lots of pairs of zeros to push the P values down really low like that.

댓글 수: 2
없음 표시없음 숨기기

dandan 2015년 4월 22일

P being zero because it's too small to report makes sense, thanks so much! It sounds like you might have a better intuitive understanding than I do of how the pairs of zeros affect P, though. Can you please explain?

I've noticed that the correlation coefficient and P-value of two zero arrays are both reported as NaN, and I'm trying to understand how Matlab factors pairs or half-pairs of zeros into both R and P. Thanks!

the cyclist 2015년 4월 22일

If you neglect your zeros for a minute, you already have pretty highly correlated sequences, and you have a pretty small P-value (indicating that sequences that highly correlated empirically are very unlikely to have occurred by chance from truly uncorrelated variables).

Now think what happens as you append a pair of zeros to that, and another pair of zeros, and another pair, and so on. Those zeros are basically perfectly correlated with each other. Your correlation is getting higher and higher and higher (bigger r) ... and less and less likely to be due to chance (smaller p). Eventually, p will be smaller than the smallest value that can be represented in MATLAB, so it will report strictly 0 instead.

I'm not sure why MATLAB reports NaN as the correlation coefficient of a perfectly correlated series, but that is not really related to your issue.

(There is nothing very special about the zeros. You could also have added pairs of 31's and gotten high correlation.)

댓글을 달려면 로그인하십시오.

Why is corrcoef returning a P-value of zero?

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Why is corrcoef returning a P-value of zero?

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기