Is there a function to create the P-P plot in Matlab, to compare two cumulative distribution functions against each other?

Question

Sim 2024년 9월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2150659-is-there-a-function-to-create-the-p-p-plot-in-matlab-to-compare-two-cumulative-distribution-functio

편집: Rahul 2024년 9월 7일

From Wikipedia: "In statistics, a P–P plot (probability–probability plot or percent–percent plot or P value plot) is a probability plot for assessing how closely two data sets agree, or for assessing how closely a dataset fits a particular model. It works by plotting the two cumulative distribution functions against each other; if they are similar, the data will appear to be nearly a straight line. This behavior is similar to that of the more widely used Q–Q plot, with which it is often confused."

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Sim 2024년 9월 6일

편집: Sim 2024년 9월 6일

MATLAB Online에서 열기

Hi @Torsten :-)

Well, just empirical data like, for example, these ones:

rng default; % for reproducibility

a = 0;

b = 100;

nb = 50;

% Create two log-normal distributed random datasets, "x" and "y'

% (but we can use any randomly distributed data)

x = (b-a).*round(lognrnd(1,1,1000,1)) + a;

y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;

% histograms of "x" and "y"

hx = histogram(x,'NumBins',nb);

hy = histogram(y,'NumBins',nb);

Sim 2024년 9월 6일

MATLAB Online에서 열기

Maybe the following code can make the p-p plot... but since the two resulting ecfd have different sizes, I do not know how to deal with it... I mean, in the following example, I compared F with a shorter G, but this is not correct...

rng default; % for reproducibility

a = 0;

b = 100;

nb = 50;

% Create two log-normal distributed random datasets, "x" and "y'

% (but we can use any randomly distributed data)

x = (b-a).*round(lognrnd(1,1,1000,1)) + a;

y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;

[F,~] = ecdf(x);

[G,~] = ecdf(y);

figure

hold on

plot(F,G(1:size(F,1)),'o','MarkerSize',10,'MarkerFaceColor','b')

plot(0:1,0:1,'-','color','k')

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Torsten 2024년 9월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2150659-is-there-a-function-to-create-the-p-p-plot-in-matlab-to-compare-two-cumulative-distribution-functio#answer_1512304

이동: Torsten 2024년 9월 6일

MATLAB Online에서 열기

rng default; % for reproducibility

a = 0;

b = 100;

nb = 50;

% Create two log-normal distributed random datasets, "x" and "y'

% (but we can use any randomly distributed data)

x = (b-a).*round(lognrnd(1,1,1000,1)) + a;

y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;

[F,t1] = ecdf(x);

[t1,ia] = unique(t1,'Stable');

F = F(ia);

[G,t2] = ecdf(y);

[t2,ia] = unique(t2,'Stable');

G = G(ia);

teval = unique(sort([t1;t2]));

Feval = interp1(t1,F,teval);

Geval = interp1(t2,G,teval);

hold on

plot(Feval,Geval,'o')

plot(0:1,0:1,'-','color','k')

hold off

grid on

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Sim 2024년 9월 6일

이동: Torsten 2024년 9월 6일

MATLAB Online에서 열기

Thanks a lot @Torsten :-) I tried both your solution and the @Rahul ones, and only your solution works.... If you put it in a "Answer" section, I can accept it... Unless, @Rahul changes something to make his solution work for my example :-)

% inputs

rng default;

a = 0;

b = 100;

nb = 50;

x = (b-a).*round(lognrnd(1,1,1000,1)) + a;

y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;

% Torsten solution

[F,t1] = ecdf(x);

[t1,ia] = unique(t1,'Stable');

F = F(ia);

[G,t2] = ecdf(y);

[t2,ia] = unique(t2,'Stable');

G = G(ia);

teval = unique(sort([t1;t2]));

Feval = interp1(t1,F,teval);

Geval = interp1(t2,G,teval);

figure;

hold on;

plot(Feval,Geval,'o');

plot(0:1,0:1,'-','color','k');

hold off;

% Rahul solution

[f1, x1] = ecdf(x);

[f2, x2] = ecdf(y);

[x1_unique, ia1, ~] = unique(x1);

f1_unique = f1(ia1);

[x2_unique, ia2, ~] = unique(x2);

f2_unique = f2(ia2);

f1_interp = interp1(x1_unique, f1_unique, x2_unique, 'linear', 'extrap');

f2_interp = interp1(x2_unique, f2_unique, x1_unique, 'linear', 'extrap');

figure;

plot(f1_interp, f2_interp, 'o');

Error using plot
Specify the coordinates as vectors or matrices of the same size, or as a vector and a matrix that share the same length in at least one dimension.

Torsten 2024년 9월 6일

편집: Torsten 2024년 9월 6일

MATLAB Online에서 열기

% Modified Rahul solution

% inputs

rng default;

a = 0;

b = 100;

nb = 50;

x = (b-a).*round(lognrnd(1,1,1000,1)) + a;

y = (b-a).*round(lognrnd(0.88,1.1,1000,1)) + a;

[f1, x1] = ecdf(x);

[f2, x2] = ecdf(y);

[x1_unique, ia1, ~] = unique(x1);

f1_unique = f1(ia1);

[x2_unique, ia2, ~] = unique(x2);

f2_unique = f2(ia2);

f1_interp = interp1(x1_unique, f1_unique, union(x1_unique,x2_unique));

f2_interp = interp1(x2_unique, f2_unique, union(x1_unique,x2_unique));

hold on

plot(f1_interp, f2_interp, 'o');

plot(0:1,0:1,'-','color','k')

hold off

grid on

Sim 2024년 9월 6일

Thanks a lot @Torsten!!

댓글을 달려면 로그인하십시오.

Answer 2

Rahul 2024년 9월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2150659-is-there-a-function-to-create-the-p-p-plot-in-matlab-to-compare-two-cumulative-distribution-functio#answer_1512129

편집: Rahul 2024년 9월 7일

MATLAB Online에서 열기

Hi Sim,

I understand that you’re trying to generate aPP (Probability-Probability) plot of two datasets, where a pp plot is made by plotting the fraction failing (CDF) of one distribution vs the fraction failing (CDF) of another distribution.

To generate this plot we simply plot the CDF of one distribution vs the CDF of another distribution. If the distributions are very similar, the points will lie on the 45 degree diagonal. Any deviation from this diagonal indicates that one distribution is leading or lagging the other.

Below is the reference code for your understanding:

1. Define Your Data

Assuming two datasets of unirform random distribution, ‘data1’ and ‘data2’, which you want to compare using a P–P plot.

data1 = randn(100, 1); % Example data set 1  
data2 = randn(100, 1); % Example data set 2 

2. Compute the Cumulative Distribution Functions (CDFs)

You need to calculate the empirical CDFs of both datasets, for which you can use the ‘ecdf’ function, and futher interpolate the CDF values to match the percentiles of the other dataset.

% Compute CDFs for data1  
[f1, x1] = ecdf(data1);  
% Compute CDFs for data2  
[f2, x2] = ecdf(data2); 
% Ensure x1 and x2 are unique and sorted  
[x1_unique, ia1, ~] = unique(x1);  
f1_unique = f1(ia1);  
[x2_unique, ia2, ~] = unique(x2);  
f2_unique = f2(ia2);
% Interpolate CDFs
f1_interp = interp1(x1_unique, f1_unique, x2_unique, 'linear', 'extrap');
f2_interp = interp1(x2_unique, f2_unique, x1_unique, 'linear', 'extrap'); 

3. Create the P–P Plot

After aligning CDF values from both datasets, you can plot them against each other.

figure;

plot(f1_interp, f2_interp, 'o');

xlabel('CDF of data1');

ylabel('CDF of data2');

title('P–P Plot');

hold on;

xline = [min(f1_interp), max(f1_interp)];

yline = xline;

% Plot the 45-degree line

plot(xline, yline, 'r--', 'LineWidth', 2);

axis equal;

grid on;

Normalization: Make sure your datasets are appropriately scaled or normalized if they are not in the same range.
Handling NaNs or Infinities: Ensure your data does not contain NaNs or infinities, which can affect interpolation and plotting.

For more information regarding usage of ‘cdf’ function, refer to the documentation link mentioned below:

https://www.mathworks.com/help/stats/prob.normaldistribution.cdf.html

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Rahul 2024년 9월 6일

편집: Rahul 2024년 9월 6일

Hi @Sim,

Sorry for that, I missed out on that part, take a look now I've edited my answer to include f1_interp and f2_interp in 2nd section.

Sim 2024년 9월 6일

Thanks @Rahul :-) However, your solution still does not work for my initial example :-(

댓글을 달려면 로그인하십시오.

Is there a function to create the P-P plot in Matlab, to compare two cumulative distribution functions against each other?

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Is there a function to create the P-P plot in Matlab, to compare two cumulative distribution functions against each other?

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기