How to make covariance matrix positive semi-definite (PSD)

Question

Danny 2015년 1월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/168898-how-to-make-covariance-matrix-positive-semi-definite-psd

댓글: John D'Errico 2015년 1월 4일

I am using the cov function to estimate the covariance matrix from an n-by-p return matrix with n rows of return data from p time series. Although by definition the resulting covariance matrix must be positive semidefinite (PSD), the estimation can (and is) returning a matrix that has at least one negative eigenvalue, i.e. it is not positive semi-definite.

There are many discussions out there about how to transform a non-PSD covariance matrix to a PSD matrix, but I am wondering if there is an efficient way to identify the columns (individual time series) that are causing the calculation to return a non-PSD matrix, eliminate the columns, and then have the cov function return a PSD matrix without needing any artificial transformations?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

John D'Errico 2015년 1월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/168898-how-to-make-covariance-matrix-positive-semi-definite-psd#answer_163957

편집: John D'Errico 2015년 1월 4일

MATLAB Online에서 열기

No, there is not a way. At least there is no constructive, unambiguous, intelligent way.

For example, consider the covariance matrix that arises from

X = repmat(rand(10,1),1,2);
C = cov(X);

Which column causes it to be not positive definite? Column 1 or column 2? What about column 2 makes it more a factor in that zero eigenvalue? I could as easily argue for column 1.

Or, how about this one:

X = rand(10,2);
X = [X,-mean(X,2)];
C = cov(X);

Here, I can delete any of the three columns and end up with a positive definite result, and each column is as "important" in contributing to the zero eigenvalue.

If you wish, I can keep going. How about this one?

X = randn(10,11);
C = cov(X);

Again, each column is as equally random as any other. And since they were randomly generated, we can write any column as a linear combination of the remaining columns. With probability essentially 1, there will be no zero coefficients employed in that linear combination. So which column is the offender? And if you say the last column, then I'll just randomly permute the columns and get a different answer. So effectively, your answer would be to just choose a random column.

Just use a good tool that will yield a positive definite matrix, and do so efficiently. nearestSPD is such a tool.

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

John D'Errico 2015년 1월 4일

편집: John D'Errico 2015년 1월 4일

MATLAB Online에서 열기

Well, MVNRND should generally fail if the matrix is not positive definite. Why?

The common test used is if chol fails on a matrix, then it is not SPD. And since the transformation used to produce random variates in MVNRND employs the output of chol, you would not get any useful output from MVNRND from that non-repaired matrix. So this is not a comparison you could have made anyway.

In the case of a matrix that is non-spd due to those errors in the least significant bits, the repaired matrix will be different in only those least significant bits. So the change made will be essentially insignificant, EXCEPT that MVNRND will work after the perturbation made by nearestSPD.

For example, I'll create a covariance matrix that is numerically rank deficient.

X = randn(100,110);
C = cov(X);
e = eig(C)
e =
  -4.3335e-16
  -3.8466e-16
  -2.7843e-16
  -1.5575e-16
  -7.8879e-17
   5.9581e-18
   1.1062e-16
   1.8245e-16
   2.5164e-16
   4.3026e-16
   4.9536e-16
    0.0043996
     0.012847
     0.013444
      0.01509
     0.026126
     0.029338
     0.032768
     0.040712
     0.043929
     0.056971
     0.059183
     0.065438
     0.075928
     0.084532
     0.088806
... (many non-interesting eigenvalues removed) ...
       3.1631
        3.269
       3.3023
       3.4668
       3.5408
       3.5673
       4.0556
Chat = nearestSPD(C);
norm(C- Chat)
ans =
   4.7483e-15

So the difference between C and Chat is tiny.

L = chol(C);
Error using chol
Matrix must be positive definite.

Whereas, Chat offers no problem with chol, although sometimes it too may show some tiny negative eigenvalues. nearestSPD is written to pass the chol test, and to do so with a minimal perturbation to the original matrix. In fact, in this case, eig still produces one tiny negative eigenvalue, but chol does not care.

min(eig(Chat))
ans =
  -5.3566e-17
L = chol(Chat);

No problem.

Danny 2015년 1월 4일

Thank you very much for both replies.

John D'Errico 2015년 1월 4일

Hmm. As I think about this, I could perhaps write a custom version of COV, that would also return a valid cholesky factor of the covariance matrix, without any need to perturb the covariance matrix as a singularity repair. It is quite simple to do as it turns out. Of course, the problem is the only people who want that cholesky factor are those who would then use a tool like MVNRND. And MVNRND uses CHOL.

댓글을 달려면 로그인하십시오.

How to make covariance matrix positive semi-definite (PSD)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

How to make covariance matrix positive semi-definite (PSD)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기