필터 지우기
필터 지우기

How to calculate R^2 using 1 - (SSR/SST)? For normal fit distribution.

조회 수: 45 (최근 30일)
Macy
Macy 2023년 2월 15일
댓글: Torsten 2023년 2월 21일
Hello, I have used the fitlm function to find R^2 (see below), to see how good of a fit the normal distribution is to the actual data. The answer is 0.9172.
How can I manually calculate R^2?
R^2 = 1 - (SSR/SST) or in other words 1 - ((sum(predicted - actual)^2) / ((sum(actual - mean of actual)^2)). I am having a hard time getting the correct answer.
Table = readtable("practice3.xlsx");
actual_values = Table.values;
actual_values = sort(actual_values);
normalfit = fitdist(actual_values,'Normal'); % fit the normal distribution to the data
cdfplot(actual_values); % Plot the empirical CDF
x = 0:2310;
hold on
plot(x, cdf(normalfit, x), 'Color', 'r') % plot the normal distribution
hold off
grid on
nonExceedanceProb = sum(actual_values'<=actual_values,2)/numel(actual_values);
Table.nonExceedanceProb=nonExceedanceProb;
mdl=fitlm(cdf(normalfit, actual_values),Table.nonExceedanceProb);
mdl.Rsquared.Ordinary % R^2
ans = 0.9172
mdl.SSR
ans = 0.7567
mdl.SST
ans = 0.8250
% How can I manually calculate R^2 (or SSR and SST)?
% SSR = sum(((predicted data - actual data).^2))
% TSS = sum((actual data - mean(actual data)).^2)
% Rsquared = 1 - SSR/TSS

채택된 답변

Torsten
Torsten 2023년 2월 15일
편집: Torsten 2023년 2월 15일
In my opinion, it does not make sense to fit a linear function to the value pairs (cdf(normalfit, actual_values),Table.nonExceedanceProb) as you do above.
In principle, the blue points below should lie on the red line. This would mean that the empirical cdf is perfectly reproduced by the normal distribution.
So if you really want to compare the two distributions, you should consider the distance of the blue points (achieved quality of fit) to the red line (perfect fit).
Table = readtable("practice3.xlsx");
actual_values = Table.values;
actual_values = sort(actual_values);
normalfit = fitdist(actual_values,'Normal'); % fit the normal distribution to the data
nonExceedanceProb = sum(actual_values'<=actual_values,2)/numel(actual_values);
hold on
plot(nonExceedanceProb,cdf(normalfit, actual_values),'o')
plot([0 1],[0 1])
xlabel('P(empirical)')
ylabel('P(normal)')
hold off
grid on
  댓글 수: 12
Macy
Macy 2023년 2월 21일
Yes, Rsquared1 he said is the "pearson correlation coefficient" and Rsquared2 is the "coefficient of determination".
Torsten
Torsten 2023년 2월 21일
corr(yi,fi) is the pearson correlation coeffcient - I don't know why he wanted to square it.
Anyway: congratulations that you finished your assignment successfully.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

태그

제품


릴리스

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by