Non-reproducible solution with non-linear fitting

Question

burges 2019년 10월 16일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485677-non-reproducible-solution-with-non-linear-fitting

편집: John D'Errico 2019년 10월 16일

myDataset.xlsx

I have been trying to fit a power-law distribution: ns ~ s^-a (where ns is the probability density function; s is the empirical observation; and a is the scaling exponent) to my empirical data (see attached excel sheet) using MATLAB's fit function. However, I am having problems getting a reproducible solution. When I do not specify a startpoint, I get different results for each run -- in spite of the fact that I'm using the same dataset. Then, when I do specify a startpoint, the function gives me the same result as my specified startpoint. Also, the issue remains regardless of whether I specify an algorithm such as the Levenberg-marquardt, or if I used the default algorithm. My codes are as follows:

clc; clear; close all;
%%%%%%%%%%%%%%%%%%% Load data
[data, ~, ~] = xlsread('myDataset.xlsx',1,'A:A'); 
%%%%%%%%%%%%%%%%%% Histogram plot 
data = sort(data);
[y, edges] = histcounts(data, 100000, 'Normalization','pdf');
edges = edges(2:end) - (edges(2)-edges(1))/2;
figure;
scatter(edges, y, 4, 'ro', 'Markerfacecolor', 'r');
hold on;
box on;
edges = edges';
y = y';
%%%%%%%%%%%%%%%%%% Specify input parameters
%(1): cut-off value
xmin = 1*10^5; 
%(2): x-values after cut-off
x    = edges(edges>=xmin);
ind  = find(edges>=xmin,1,'first');
%(3): y-values after cut-off
y([1:ind-1]) = [];
%%%%%%%%%%%%%%%%%% fit power-law model
f = fittype('b*x.^-a'); 
fModel = fit(x, y, f, 'Algorithm','levenberg-marquardt'); 
coeffs = coeffvalues(fModel);
plot(fModel,'b--')
set(gca, 'xscale','log', 'yscale','log')
xlabel('s', 'fontsize',8)
ylabel('ns', 'fontsize',8)

The value of a is said to be around 2.189. Although I may not get this exact value - as it will depend on the dataset, I however expect to at least get values in the range of about 1.5 - 2.

I do appreciate any help, suggestions, or references that will be useful in solving the problem. Many thanks.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

John D'Errico 2019년 10월 16일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/485677-non-reproducible-solution-with-non-linear-fitting#answer_396621

편집: John D'Errico 2019년 10월 16일

MATLAB Online에서 열기

Do you understand that when you do not specify astart point, that fit uses a RANDOM start point? So why would you expect the same results from a random start?

When you start from the same point, you get the same result. That is not a surprise at all. Why not? Because your data is composed of unholy crap.

You are trying to fit a power model to data y = f(x) = b*x^(-a). Ok, a simple model. Trivial, right? LOOK AT YOUR DATA!!!!!!!!!

plot(x,y,'.')

That looks a little strange. Looking more closely at your data, we see:

numel(y)
ans =
       93279
       
sum(y == 0)
ans =
       93232

So, of 93279 data points, all but 47 of them have y IDENTICALLY equal to zero. 99.95% of your data is completely useless in fitting that model. The rest of it is not of much more value.

Are you serious? Do you rationally expect this to work? Sigh.

Get better data. And use intelligently chosen start points, if you seriously want a decent estimate in any nonlinear regression. That will also alleviate the randomness of your result.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Non-reproducible solution with non-linear fitting

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Non-reproducible solution with non-linear fitting

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기