The best approach to avoid the Kullback–Leibler divergence equal to infinite
조회 수: 86 (최근 30일)
이전 댓글 표시
Given two discrete probability distributions P and Q, containing zero values in some bins, what is the best approach to avoid the Kullback–Leibler divergence equal to infinite (and therefore getting some finite value, between zero and one)? Is there any function in Matlab that could calculate the Kullback–Leibler divergence correctly, i.e. by solving this issue?
Here below an example of calculation of the Kullback–Leibler divergence between P and Q, which gives an infinite value. I am tempted to manually remove "NaNs" and "Infs" from "log2( P./Q )", but I am afraid this is not correct. In addition, I am not sure that smoothing the PDFs could solve the issue...
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1); % sum(P) = 0.999999999
Q = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M = numel(P);
P = reshape(P,[M,1]);
Q = reshape(Q,[M,1]);
KLD = nansum( P .* log2( P./Q ) )
log2( P./Q )
댓글 수: 0
채택된 답변
Matt J
2023년 6월 27일
% Input
A =[ 0.444643925792938 0.258402203856749
0.224416517055655 0.309641873278237
0.0730101735487732 0.148209366391185
0.0825852782764812 0.0848484848484849
0.0867743865948534 0.0727272727272727
0.0550568521843208 0.0440771349862259
0.00718132854578097 0.0121212121212121
0.00418910831837223 0.0336088154269972
0.00478755236385398 0.0269972451790634
0.00359066427289048 0.00110192837465565
0.00538599640933573 0.00220385674931129
0.000598444045481747 0
0.00299222022740874 0.00165289256198347
0 0
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0.00119688809096349 0.000550964187327824
0 0.000550964187327824
0 0.000550964187327824
0.000598444045481747 0
0.000598444045481747 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0.000550964187327824
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0
0.00119688809096349 0.000550964187327824];
P = A(:,1); % sum(P) = 0.999999999
Q = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M = numel(P);
P = reshape(P,[M,1]);
Q = reshape(Q,[M,1]);
tf=P~=0 & Q~=0;
KLD = nansum( P(tf) .* log2( P(tf)./Q(tf) ) )
댓글 수: 2
추가 답변 (2개)
the cyclist
2023년 6월 27일
Your question is not really a MATLAB question, but a math/stats questions. (At least, it seems you already understand how to remove the NaNs and infinities, but are concerned about the theoretical validity of that.)
I don't know if it will help you or not, but the best explanation I found online about this is in this mathoverflow answer. I don't think I could explain the issue any more clearly myself, so I'll just point you there.
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!