Compare probability density functions of 2 vectors

조회 수: 114 (최근 30일)
T.C.
T.C. 2024년 11월 8일 13:57
편집: Bruno Luong 2024년 11월 10일 13:20
I have two vectors v and w of different legths and with positive elements. Each element of both vectors can be considered to be drawn according to some probability density function. I would need to check that the pdf of both is (more or less) the same, how could I do this in Matlab? I was thinking to plot the cdf of both vectors and see how much they differed, but maybe there's a better way to do that.
  댓글 수: 2
Jeff Miller
Jeff Miller 2024년 11월 9일 18:06
Did you already check whether the means/medians are the same (e.g., ttest2 or ranksum) and also check the variances (e.g., vartest2, ansaribradley)? If you can reject equality of either of those then of course you can conclude the vectors come from populations with different pdfs.
You can also use kstest2 to check the full pdfs, but keep in mind that this test has very low power unless the pdfs are very different or the vectors are very long. (That means the data will rarely allow you to conclude the pdf's are different even when they are. Power would be much higher for the tests of means and variances.)
William Rose
William Rose 2024년 11월 10일 0:31
Jeff Miller, good idea.

댓글을 달려면 로그인하십시오.

채택된 답변

Bruno Luong
Bruno Luong 2024년 11월 8일 14:14
Use histcounts to compute pdfs then compare them
  댓글 수: 2
Bruno Luong
Bruno Luong 2024년 11월 9일 19:42
편집: Bruno Luong 2024년 11월 10일 13:20
If seems cdf comparison is better than pdf. There os a better separation. Both are normalized to 1 (maximum difference).
As Jeff Miller has suggested you can also derive mean, median, standard deviation, high order moments, skewness, kurtosis, etc
Ntest = 100;
for i = 1:Ntest
x1=rand(1,500);
x2=rand(1,400);
x3=rand(1,600)*1.2;
[dpdf12(i),dcdf12(i)] = diffpdf(x1,x2);
[dpdf13(i),dcdf13(i)] = diffpdf(x1,x3);
end
i = 1:Ntest;
subplot(2,1,1)
plot(i, dpdf12, 'b', i, dpdf13, 'r')
ylabel('dpdf')
legend('x1-x2','x1-x3')
subplot(2,1,2)
plot(i, dcdf12, 'b', i, dcdf13, 'r')
ylabel('dcdf')
legend('x1-x2','x1-x3')
function [dpdf,dcdf] = diffpdf(x1,x2)
x = [x1,x2];
[a,b] = bounds(x);
edges = linspace(a,b,33);
pdf1 = histcounts(x1,edges,'Normalization','pdf');
pdf2 = histcounts(x2,edges,'Normalization','pdf');
dx = mean(diff(edges));
dpdf = norm(pdf1-pdf2,1)*dx/2; % divide by 2 so as max(dpdf) is 1
cdf1 = cumsum(pdf1)*dx; % histcounts(x1,edges,'Normalization','cdf');
cdf2 = cumsum(pdf2)*dx; % histcounts(x2,edges,'Normalization','cdf');
dcdf = norm(cdf1-cdf2, Inf);
end
Bruno Luong
Bruno Luong 2024년 11월 10일 9:25
Another exmple to show the dindicator when comppare RAND and RANDN
Ntest = 100;
for i = 1:Ntest
x1=rand(1,500);
x2=rand(1,400);
x3=randn(1,600);
[dpdf12(i),dcdf12(i)] = diffpdf(x1,x2);
[dpdf13(i),dcdf13(i)] = diffpdf(x1,x3);
end
i = 1:Ntest;
subplot(2,1,1)
plot(i, dpdf12, 'b', i, dpdf13, 'r')
ylabel('dpdf')
legend('x1-x2','x1-x3')
subplot(2,1,2)
plot(i, dcdf12, 'b', i, dcdf13, 'r')
ylabel('dcdf')
legend('x1-x2','x1-x3')
function [dpdf,dcdf] = diffpdf(x1,x2)
x = [x1,x2];
[a,b] = bounds(x);
edges = linspace(a,b,33);
pdf1 = histcounts(x1,edges,'Normalization','pdf');
pdf2 = histcounts(x2,edges,'Normalization','pdf');
dx = mean(diff(edges));
dpdf = norm(pdf1-pdf2,1)*dx/2; % divide by 2 so as max(dpdf) is 1
cdf1 = cumsum(pdf1)*dx; % histcounts(x1,edges,'Normalization','cdf');
cdf2 = cumsum(pdf2)*dx; % histcounts(x2,edges,'Normalization','cdf');
dcdf = norm(cdf1-cdf2, Inf);
end

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

William Rose
William Rose 2024년 11월 8일 14:48
편집: William Rose 2024년 11월 9일 23:00
[Edit: fix error in my code below.]
x1=rand(1,50);
x2=rand(1,40);
[h,p]=kstest2(x1,x2);
if h
fprintf('prob(x1,x2) from same distribution=%.3f.',p);
fprintf('Accept the null hypothesis that they are are from same distribution.');
else
fprintf('prob(x1,x2) from same distribution=%.3f.',p);
fprintf('Reject the null hypothesis that they are are from same distribution.');
end
prob(x1,x2) from same distribution=0.503.
Accept the null hypothesis that they are are from same distribution.
OK
  댓글 수: 10
Bruno Luong
Bruno Luong 2024년 11월 9일 14:40
@Paul "By "p seem to vary quite a bit from run to run." do you mean the following?"
Yes your question seems arrive at the samme time than I edit my post.
Paul
Paul 2024년 11월 10일 1:46
Hi William,
After the edit, I still think the logic in the code is not correct. According to kstest2, if h = 1, the null hypothesis should be rejected. Consider what happens if we change the distribution for x2
x1=rand(1,50);
%x2=rand(1,40);
x2=randn(1,40);
[h,p]=kstest2(x1,x2);
if h
fprintf('prob(x1,x2) from same distribution=%.3f.',p);
fprintf('Accept the null hypothesis that they are are from same distribution.');
else
fprintf('prob(x1,x2) from same distribution=%.3f.',p);
fprintf('Reject the null hypothesis that they are are from same distribution.');
end
prob(x1,x2) from same distribution=0.000.
Accept the null hypothesis that they are are from same distribution.

댓글을 달려면 로그인하십시오.

제품


릴리스

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by