I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

Question

PEDRO ALEXANDRE Fernandes 2023년 11월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2042031-i-wanted-to-apply-the-chi-squared-function-with-the-return-of-the-p-value-but-matlab-s-chi2cdf-func

편집: dpb 2023년 11월 3일

% Example data matrix (2000 rows and 9 columns)
data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data
% Example empirical frequency (a vector with 9 elements)
empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency
% Initialize vectors to store results
chi_squared_results = zeros(2000, 1);
p_values = zeros(2000, 1);
for i = 1:2000
    % Select the data for row i
    row_i = data_matrix(i, :);
    
    % Calculate the chi-squared statistic manually
    chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);
    
    % Determine the degrees of freedom (df)
    df = length(row_i) - 1;
    
    % Calculate the p-value using the chi-squared distribution
    p = 1 - chi2cdf(chi_squared, df);
    
    % Store the results in vectors
    chi_squared_results(i) = chi_squared;
    p_values(i) = p;
end
unique(p_values)
ans = 0

The problem is that chicdf return 0.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

PEDRO ALEXANDRE Fernandes 2023년 11월 2일

Yes that is the problem

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

dpb 2023년 11월 2일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2042031-i-wanted-to-apply-the-chi-squared-function-with-the-return-of-the-p-value-but-matlab-s-chi2cdf-func#answer_1345501

편집: dpb 2023년 11월 2일

MATLAB Online에서 열기

% Example data matrix (2000 rows and 9 columns)

data_matrix = randi([0, 10], 2000, 9); % Replace this with your actual data

% Example empirical frequency (a vector with 9 elements)

empirical_frequency = [10, 20, 30, 40, 50, 60, 70, 80, 90]; % Replace this with your actual empirical frequency

% Initialize vectors to store results

chi_squared_results = zeros(2000, 1);

p_values = zeros(2000, 1);

for i = 1:2000

% Select the data for row i

row_i = data_matrix(i, :);

% Calculate the chi-squared statistic manually

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency);

% Determine the degrees of freedom (df)

df = length(row_i) - 1;

% Calculate the p-value using the chi-squared distribution

p = 1 - chi2cdf(chi_squared, df);

% Store the results in vectors

chi_squared_results(i) = chi_squared;

p_values(i) = p;

end

histogram(chi_squared_results)

%unique(p_values)

[min(chi_squared_results) max(chi_squared_results)]

ans = 1×2

321.3518 421.4471

chi2cdf(ans, df)

ans = 1×2

1 1

What would you expect when compare a random vector from 1:10 against an expected cumulative distribution frequency of 10:10:100?

As the above indicates, the minimum ch-square statistic calculated was 323; that's so far from being within the range of a realistic test statistic the actual percentage less than unity underflows the precision of a double and so is returned as identically 1. Try something more like

row_i=randi([0, 100], 1, 9)  % test vector between 0-100 instead 0-1
row_i = 1×9
    97    27     9    47    15    34    43    54    38
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 859.9504
p = 1 - chi2cdf(chi_squared, df)
p = 0

That's still way out of reason; by chance for the given vector the essentially full cdf value turned out to be in the first element; not exactly surprising it ends up with identically zero estimate.

Now, keep the same vector but sort it to get what could be an approximation to a cdf...

row_i=sort(row_i)
row_i = 1×9
     9    15    27    34    38    43    47    54    97
chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)
chi_squared = 26.7983
p = 1 - chi2cdf(chi_squared, df)
p = 7.6598e-04

Now, the above random vector starts out not too bad in comparison to exected, with several quite low values in the 50:80 range that make it not fit all that well--but at least it's computable.

figure
plot(empirical_frequency,sort(row_i))
xlabel('Expected','Observed')

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

dpb 2023년 11월 2일

편집: dpb 2023년 11월 3일

MATLAB Online에서 열기

I didn't want to ruin the great for illustration random vector created last run above so I didn't actually rerun to plot the observed versus expected...

row_i=[97 27 9 47 15 34 43 54 38];

empirical_frequency=[10:10:90];

subplot(3,1,1)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'b*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','random','location','north')

subplot(3,1,2)

hold on

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,sort(row_i),'r*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','sorted','location','northwest')

subplot(3,1,3)

hold on

row_i=[9 15 27 34 53 57 78 78 97 ];

chi_squared = sum((row_i - empirical_frequency).^2 ./ empirical_frequency)

chi_squared = 4.3887

df=numel(row_i)-1;

p = 1 - chi2cdf(chi_squared, df)

p = 0.8205

plot([0 empirical_frequency 100],[0 empirical_frequency 100],'k-')

plot(empirical_frequency,row_i,'g*-')

xlim([0 100]), ylim([0 100]), box on

legend('reference','adjusted','location','northwest')

Now if in the end we take a set of data that actually do follow roughly the path of the empirical cdf, then, by golly, we get a chi-square statistic that actually indicates that set of observations couldn't really be ruled out as having come from the parent distribution. As noted, the "corrections" made to the random vector were to raise the 5th thru 8th values up to some values that were roughly in line...then the deviations from empirical weren't nearly so large...

댓글을 달려면 로그인하십시오.

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

I wanted to apply the chi-squared function with the return of the p-value, but matlab's chi2cdf function only returns zero.

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기