Minimize error between data distribution and expected distribution

조회 수: 1 (최근 30일)

이전 댓글 표시

PEF 2013년 3월 20일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/67928-minimize-error-between-data-distribution-and-expected-distribution

Hi all,

I have a 3 set of data which are expected to:

1) 1st data-block to approach a Gaussian distribution with mu = 0 and sigma = 1;

2) 2nd data-block to approach a Gaussian distribution with mu = 0 and sigma = .8;

3) 3rd data-block to approach a Gaussian distribution with mu = 0 and sigma = .5;

Each data-block has only a limited number of representations (generally between 2048 and 8192) and because of some filter effects drawn by the specific code I use, they will not exactly match the corresponding expected distribution.

The point is that, although what it implies in terms of manipulation, I want each data-block to minimize the discrepancy between actual and expected distribution. It's to be remarked that I won't increase the number of representations, due to some need I will not explain in detail.

Generally, the first data-block, respect to the normal Gaussian distribution, looks like the followinf figure:

I was thinking to use lsqcurvefit for this purpose.

What would you suggest?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

답변 (1개)

Wouter 2013년 3월 20일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/67928-minimize-error-between-data-distribution-and-expected-distribution#answer_79324

MATLAB Online에서 열기

Do you know this function:

histfit

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

Wouter 2013년 3월 21일

편집: Wouter 2013년 3월 21일

You could try to change individual datapoints after your filteringset in order to update your datapoints; this will change the blue bars. For example; find a blue bar that is too high; change one of those datapoints into a value which lies in a blue bar that too low (compared to the red line). This does however changes your data and will render step 2)treat_with_piece_of_code useless.

However it makes more sense to find a better fit to the histogram; i.e. change the red line. Lsqcurvefit would only be useful if you would like to update the red line (fit)

PEF 2013년 3월 21일

MATLAB Online에서 열기

I think that you started to get the point :)

The major concern is that I don't want to find the best fit to the data, but the best data fitting the standard normal distribution: for some reasons I need my data to fit gaussian distribution with mean 0 and sigma 1.

At the moment I'm proceeding this way:

 data = randn(4096,1);
 [f_p,m_p] = hist(data,128);
 f_p = f_p/trapz(m_p,f_p);
 x_th = min(data):.001:max(data);
 y_th = normpdf(x_th,0,1);
 f_p_th = interp1(x_th,y_th,m_p,'spline','extrap');
 figure(1)
 bar(m_p,f_p)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off
 figure(2)
 bar(m_p,f_p_th)
 hold on
 plot(x_th,y_th,'r','LineWidth',2.5)
 grid on
 hold off

Now, I would proceed with calculating a scaling factor

sf = abs(f_p_th,f_p);

and I consequently scale the data in accordance to the scale factor of the corresponding bin; for example:

if data(1) falls within bin(1) --> scale with sf(1) and so on.

I do think that my question is no counter-intuitive, it's only reversing the standard procedure of fitting a distribution to a given set of data.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

카테고리

AI and Statistics Statistics and Machine Learning Toolbox Probability Distributions Continuous Distributions Triangular Distribution

Help Center 및 File Exchange에서 Triangular Distribution에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by