The best approach to avoid the Kullback–Leibler divergence equal to infinite

Question

Sim 2023년 6월 27일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1988808-the-best-approach-to-avoid-the-kullback-leibler-divergence-equal-to-infinite

편집: Sim 2023년 6월 28일

Given two discrete probability distributions P and Q, containing zero values in some bins, what is the best approach to avoid the Kullback–Leibler divergence equal to infinite (and therefore getting some finite value, between zero and one)? Is there any function in Matlab that could calculate the Kullback–Leibler divergence correctly, i.e. by solving this issue?

Here below an example of calculation of the Kullback–Leibler divergence between P and Q, which gives an infinite value. I am tempted to manually remove "NaNs" and "Infs" from "log2( P./Q )", but I am afraid this is not correct. In addition, I am not sure that smoothing the PDFs could solve the issue...

% Input
A =[     0.444643925792938         0.258402203856749
         0.224416517055655         0.309641873278237
        0.0730101735487732         0.148209366391185
        0.0825852782764812        0.0848484848484849
        0.0867743865948534        0.0727272727272727
        0.0550568521843208        0.0440771349862259
       0.00718132854578097        0.0121212121212121
       0.00418910831837223        0.0336088154269972
       0.00478755236385398        0.0269972451790634
       0.00359066427289048       0.00110192837465565
       0.00538599640933573       0.00220385674931129
      0.000598444045481747                         0
       0.00299222022740874       0.00165289256198347
                         0                         0
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
                         0      0.000550964187327824
      0.000598444045481747                         0
      0.000598444045481747                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
       0.00119688809096349      0.000550964187327824];
P   = A(:,1); % sum(P) = 0.999999999
Q   = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M   = numel(P);                   
P   = reshape(P,[M,1]);           
Q   = reshape(Q,[M,1]);
KLD = nansum( P .* log2( P./Q ) )
KLD = Inf
log2( P./Q )
ans = 50×1
    0.7830
   -0.4644
   -1.0215
   -0.0390
    0.2548
    0.3209
   -0.7552
   -3.0041
   -2.4955
    1.7042

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Matt J 2023년 6월 27일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1988808-the-best-approach-to-avoid-the-kullback-leibler-divergence-equal-to-infinite#answer_1263613

MATLAB Online에서 열기

% Input
A =[     0.444643925792938         0.258402203856749
         0.224416517055655         0.309641873278237
        0.0730101735487732         0.148209366391185
        0.0825852782764812        0.0848484848484849
        0.0867743865948534        0.0727272727272727
        0.0550568521843208        0.0440771349862259
       0.00718132854578097        0.0121212121212121
       0.00418910831837223        0.0336088154269972
       0.00478755236385398        0.0269972451790634
       0.00359066427289048       0.00110192837465565
       0.00538599640933573       0.00220385674931129
      0.000598444045481747                         0
       0.00299222022740874       0.00165289256198347
                         0                         0
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
                         0      0.000550964187327824
      0.000598444045481747                         0
      0.000598444045481747                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
       0.00119688809096349      0.000550964187327824];
P   = A(:,1); % sum(P) = 0.999999999
Q   = A(:,2); % sum(Q) = 1
% Calculation of the Kullback–Leibler divergence
M   = numel(P);                   
P   = reshape(P,[M,1]);           
Q   = reshape(Q,[M,1]);
tf=P~=0 & Q~=0;
KLD = nansum( P(tf) .* log2( P(tf)./Q(tf) ) )
KLD = 0.1956

댓글 수: 2
없음 표시없음 숨기기

Sim 2023년 6월 28일

편집: Sim 2023년 6월 28일

Hi and thanks a lot both @Matt J and @the cyclist!

I would accept both answers (is it possible?) for the following reasons:

That one from @Matt J, since it is a practical solution to remove undesirable elements that would lead Kullback–Leibler divergence towards an infinite value.
That one from @the cyclist because it - rigthly - shows that this question is actually out of scope of the Matlab forum, and it indicates a link where to deeepen this topic.

Matt J 2023년 6월 28일

@Sim You cannot Accept multiple answers, so just pick one.

댓글을 달려면 로그인하십시오.

Answer 2

the cyclist 2023년 6월 27일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1988808-the-best-approach-to-avoid-the-kullback-leibler-divergence-equal-to-infinite#answer_1263638

Your question is not really a MATLAB question, but a math/stats questions. (At least, it seems you already understand how to remove the NaNs and infinities, but are concerned about the theoretical validity of that.)

I don't know if it will help you or not, but the best explanation I found online about this is in this mathoverflow answer. I don't think I could explain the issue any more clearly myself, so I'll just point you there.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Sim 2023년 6월 28일

편집: Sim 2023년 6월 28일

MATLAB Online에서 열기

Hi and thanks a lot both @the cyclist and @Matt J!

I would accept both answers (is it possible?) for the following reasons:

That one from @the cyclist because it - rigthly - shows that this question is actually out of scope of the Matlab forum, and it indicates a link where to deeepen this topic.
That one from @Matt J, since it is a practical solution to remove undesirable elements that would lead Kullback–Leibler divergence towards an infinite value.

@the cyclist, thanks about that link! There, I found an insightful answer (that one with 24 votes, today) which explains about two approaches for this case:

"Smooth the distributions in some way, for instance with a Bayesian prior, or (similarly) taking the convex combination of the observation with some valid (nonzero) distribution. (The standard Bayesian approach is to use a Dirichlet prior..."
"Employ heuristics throwing out all the values that do not make sense, as suggested above. While I acknowledge that (2) is a convention, it doesn't really fit with the nature of these distributions, as I will explain momentarily."

As far as I understood, the "heuristics throwing out all the values that do not make sense", is what showed by @Matt J in his answer (and which was also one of my concerns..).

Instead, about the "Smooth the distributions in some way, for instance with a Bayesian prior,", I would not have any clue about how to implement it, unless that approach refers to ksdensity (to produce a kernel-smooth density estimate)

[f,xi] = ksdensity(x)

or to smoothing a histogram

y = randn(1,5001);
[heights,centers] = hist(y); % <-- histogram
w = centers(2)-centers(1);
t = linspace(centers(1)-w/2,centers(end)+w/2,n+1);
dt = diff(t);
Fvals = cumsum([0,heights.*dt]);
F = spline(t, [0, Fvals, 0]);
DF = fnder(F);  % <-- the derivative DF of the spline F is the smoothed version of the histogram

...and from the smoothed curves, I would then get the new values (that will be now greater than zero!), and re-calculate the Kullback–Leibler divergence.

댓글을 달려면 로그인하십시오.

Answer 3

Sim 2023년 6월 28일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1988808-the-best-approach-to-avoid-the-kullback-leibler-divergence-equal-to-infinite#answer_1264298

편집: Sim 2023년 6월 28일

MATLAB Online에서 열기

So far, I found this theoretical argument, which partially solve my issue, but still, the infinite values remain when

% if P_i > 0 and Q_i = 0, then P_i*log(P_i/0) = Inf

The following is what I implemented by considering the conventions described at pag.19 of "Elements of Information Theory, 2nd Edition Thomas M. Cover, Joy A. Thomas":

clear all; close all; clc;
% Input
A =[     0.444643925792938         0.258402203856749
         0.224416517055655         0.309641873278237
        0.0730101735487732         0.148209366391185
        0.0825852782764812        0.0848484848484849
        0.0867743865948534        0.0727272727272727
        0.0550568521843208        0.0440771349862259
       0.00718132854578097        0.0121212121212121
       0.00418910831837223        0.0336088154269972
       0.00478755236385398        0.0269972451790634
       0.00359066427289048       0.00110192837465565
       0.00538599640933573       0.00220385674931129
      0.000598444045481747                         0
       0.00299222022740874       0.00165289256198347
                         0                         0
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
       0.00119688809096349      0.000550964187327824
                         0      0.000550964187327824
                         0      0.000550964187327824
      0.000598444045481747                         0
      0.000598444045481747                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0      0.000550964187327824
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
                         0                         0
       0.00119688809096349      0.000550964187327824];
P   = A(:,1);
Q   = A(:,2);
% some processing
M   = numel(P);                   
P   = reshape(P,[M,1]);           
Q   = reshape(Q,[M,1]);
% At Pag.19 of "Elements of Information Theory, 2nd Edition Thomas M. Cover, Joy A. Thomas"
% (Section: 2.3 Relative Entropy and Mutual Information)
% there are three conventions to be used in the calculation of the Kullback–Leibler divergence
idx1 = find(P==0 & Q>0);  % index for convention 0*log(0/q) = 0
idx2 = find(P==0 & Q==0); % index for convention 0*log(0/0) = 0
idx3 = find(P>0 & Q==0);  % index for convention p*log(p/0) = Inf
% Calculation of the Kullback–Leibler divergence, 
% by applying the three conventions described in "Elements of Information Theory..."
tmp = P .* log2( P./Q );
tmp(idx1) = 0;            % convention 0*log(0/q) = 0
tmp(idx2) = 0;            % convention 0*log(0/0) = 0
tmp(idx3) = Inf;          % convention p*log(p/0) = Inf
KLD = sum(tmp)
KLD = Inf

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

The best approach to avoid the Kullback–Leibler divergence equal to infinite

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

The best approach to avoid the Kullback–Leibler divergence equal to infinite

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기