Probability distribution of a multiple variable sum

조회 수: 2 (최근 30일)
Rémy Bretin
Rémy Bretin 2019년 5월 10일
답변: Rémy Bretin 2019년 5월 14일
Hi everyone,
I’m coming here for really advance statistic/probability advice, which I'm a beginner in this field.
I would like to know the probability of a variable TAU_total such as TAU_total=TAU1+TAU2+….+TAU129.
The variables TAUi are independent of each other.
For each one of them, I have a sample of 20,000 values which you can see some examples of their distribution on the histograms in the attachment.
My question is the following: I would like to be able to determine the probability of TAU_total to be superior to a certain value Xmax.
Thank you for your help,
Regards,
Rémy
stat.png
  댓글 수: 4
Walter Roberson
Walter Roberson 2019년 5월 11일
The Wikipedia article about CLT talks about extensions beyond iid.
I did a quick simulation of size equal to the original question, using randn with a range of standard deviations. std() of the totals was roughly 10% larger than sum() of the individual std, divided by sqrt(129) . The calculations for iid where thus not exactly applicable, but they were pretty close. hist() of the total looks like model illustrations of a drawing a pure normal distribution until I got up to 56 bins in the histogram, at which point you could finally start to see statistical differences compared to a perfect curve.
John D'Errico
John D'Errico 2019년 5월 11일
Admittedly, when I first saw this question, I read it as the sum of 12 terms, not 129. 129 terms will cause pretty much anything to look as if it is normally distributed. :)
N = 129;
alph = rand(1,N)*2 + .5;
bet = rand(1,N)*2 + .5;
betamean = alph./(alph + bet);
betavar = alph.*bet./((alph + bet).^2.*(alph+ bet+1));
CLTmean = sum(betamean);
CLTvar = sum(betavar);
CLTstd = sqrt(CLTvar);
nsim = 100000;
X = zeros(nsim,N);
for i = 1:N
X(:,i) = betarnd(alph(i),bet(i),[1,nsim]);
end
MCsum = sum(X,2);
MCmean = mean(MCsum);
MCvar = var(MCsum);
MCstd = std(MCsum);
[CLTmean, MCmean;CLTvar,MCvar;CLTstd,MCstd]
ans =
61.6267682855143 61.6366984653151
7.56366115010423 7.53757246937701
2.75021111009759 2.74546398071018
Comparing the histograms, we see:
histogram(MCsum,100,'norm','pdf')
hold on
fplot(@(x) normpdf(x,CLTmean,CLTstd),[min(MCsum),max(MCsum)],'r')
untitled.jpg
So I don't see any problem using either approach. With only 12 terms in the sum, I'd probably go with the Monte Carlo.

댓글을 달려면 로그인하십시오.

채택된 답변

Torsten
Torsten 2019년 5월 10일
편집: Walter Roberson 2019년 5월 10일
Take samples from your empirical data and use Monte-Carlo-simulation to determine the above probability.
This code should help to take the samples:

추가 답변 (1개)

Rémy Bretin
Rémy Bretin 2019년 5월 14일
Thank you everyone for your support.
I decided to go for the MonteCarlo method which gave me a gaussien at the endas you expected.

카테고리

Help CenterFile Exchange에서 Histograms에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by