Generate normally distributed sample from data

조회 수: 2 (최근 30일)
Andrea C
Andrea C 2019년 12월 8일
댓글: Andrea C 2019년 12월 9일
Hi,
I have an array with many (>800000) rows. I want to select from one column 51 values to generate a new array with 51 normally distributed data. The values range from 0 to 10.
How can I do that?
Thanks,
Andrea

채택된 답변

Thiago Henrique Gomes Lobato
Thiago Henrique Gomes Lobato 2019년 12월 8일
편집: Thiago Henrique Gomes Lobato 2019년 12월 8일
I need to be careful to not start any discussion about how one actually define a normal distribution, but starting from the point that you don't want a exact perfect definition of normal distributed data you can use the Anderson-Darling test. The idea is to randomly sample 51 points from your array and them check if they are normal or not. To get it more robust, you can simply save the value with the highest p-value:
rng(33)
ArraySize = 80000;
A = rand(ArraySize,1); % not normal
A(500:1000) = randn(501,1); % normal
Founded = 0;
MaxIter = 1000;
Maxp = 0;
Ite = 1;
while ~Founded && Ite<MaxIter
SampledIndex = randperm(ArraySize,51); % Sample from your array
Asampled = A(SampledIndex);
[h,p] = adtest(Asampled); % Check if normal
% You can theoretically umcomment this, I however belive that looking at the max p
% is more robust
%Founded = ~h; % 0 if normal (can't reject the null hypotesis it is not normal)
if p>Maxp % Save the one that got the closest
BestAsoFar = Asampled;
Maxp = p;
end
Ite = Ite+1;
end
histogram(BestAsoFar)
  댓글 수: 2
Walter Roberson
Walter Roberson 2019년 12월 8일
? This looks like it cherry picks samples to find a subset that is approximately normally distributed??
Andrea C
Andrea C 2019년 12월 9일
Geat, it works.
This is exactly what I was looking for!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Walter Roberson
Walter Roberson 2019년 12월 8일
You can only do that under the circumstance that the column already contains normally distributed samples. If that is the case then you could use randperm() to select indices to extract from.
However, values in the range 0 to 10 are not normally distributed: normally distributed values have infinite tails in both directions. When you have a fixed finite range such as 0 to 10, then the closest you can get is a Beta distribution.

제품


릴리스

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by