How to use mvnrnd to generate a normal distribution within +- 1 sigma rather than the whole range (+- 3 sigma)

Question

W_Elsa 2018년 9월 11일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/418454-how-to-use-mvnrnd-to-generate-a-normal-distribution-within-1-sigma-rather-than-the-whole-range

댓글: Paul 2020년 6월 1일

Does anyone could help with the problem, how to use mvnrnd to generate a normal distribution within +- 1 sigma rather than the whole range (+- 3 sigma), any such option? Thank you.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

W_Elsa 2018년 9월 11일

A=2.296; B=-780; mu = [A B]; sigma = [0.249640957669784,-83.3429261894244; -83.3429261894244,27824.3263329175]; rng default Coe= mvnrnd(mu,sigma,S);

John D'Errico 2018년 9월 11일

Read my answer. For a multi-variate normal, it will be simplest to just use rejection sampling.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

John D'Errico 2018년 9월 11일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/418454-how-to-use-mvnrnd-to-generate-a-normal-distribution-within-1-sigma-rather-than-the-whole-range#answer_336324

편집: John D'Errico 2018년 9월 11일

MATLAB Online에서 열기

It is not normal if you restrict the range. Period. So the result of any such operation will NOT be normally distributed. PERIOD. Must I say it again? If I say so three times, it must be true. Normal distributions extend to infinity in both directions.

Note that +/-3 sigma is NOT the entire range. If you think it is, then you are wrong. Period.

Sigh. Yes, I know that someone will try to explain how one might do so anyway. There are two simple tricks to do this.

1. Rejection. Sample from a normal distribution, then throw away any samples that are outside of the desired range. The result will have the desired properties. So to get n samples out the end, you will need to take more than n samples, originally since you lose some. It will still not be normal, but then you probably know that by now. If the rejection rate is relatively low, then you need not do too much oversampling, and this will be fairly efficient.

2. Use of the inverse normal. This allows you to do sampling with no rejection needed. Somewhat more mental effort is needed, but not a lot.

Lets assume that we want to generate 1000 points, in the range [-1,1].

How large of a set must we oversample to do rejection sampling?

diff(normcdf([-1 1]))
ans =
      0.68269

So we will toss out roughly 31% of the samples as exceeding +/- 1 sigma. ROUGHLY. That means we need to sample ROUGHLY

n = 1000;
n*1/0.68269
ans =
       1464.8

So something around 1500 samples will be needed. I could give a better estimate using a binomial distribution with p=0.68269.

Checking wikipedia, since my brain sometimes gets these things wrong ... The mean of a binomial is n*p, the variance is n*p*(1-p). This should give me an estimate...

Nhat = ceil(n/p)
Nhat =
      1465
Nhat*p + [-3 3]*sqrt(Nhat*p*(1-p))
ans =
      946.7       1053.6

So, if I sample 1465 points, then I would expect to see somewhere between 950 and 1050 points that fall inside the interval [-1,1]. Therefore 1600 pre-rejection points will almost always leave me safe.

mu = 0;
sigma = 1;
X0 = normrnd(mu,sigma,[1,1600]);
X0((X0 < -1) | (X0 > 1)) = []; % rejection step
[min(X0),max(X0)]
ans =
     -0.99422      0.98885
numel(X0)
ans =
        1088
Xrej = X0(1:n); % keep the first 1000 samples

So rejection is trivial to do.

Option 2, using the inverse normal is easy enough too, but it requires you to understand these methods.

mu = 0;
sigma = 1;
n = 1000;
normallimits = [-1 1];
unilimits = normcdf(normallimits);
unisample = rand(1,n)*diff(unilimits) + unilimits(1);
Xtrans = norminv(unisample,mu,sigma);

So in the scheme for option 2, I sampled randomly from a UNIFORM distribution, then pushed them through an inverse normal to get the final set. No rejection was needed at all.

In both cases, the samples follow what would be technically called a truncated normal distribution. Had I taken a significantly larger sample, the result would look fairly smoothly like a truncated normal. With only 1000 points, the histogram won't look like much of anything.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

John D'Errico 2018년 9월 11일

편집: John D'Errico 2018년 9월 11일

The inverse transformation scheme would not be easy to write in multiple dimensions anyway. I think I could make something work, but not worth it for a case where rejection is so easy. So unless the rejection rate is so significant that it becomes terribly inefficient, a truncated multi-variate normal is way easier to perform.

Paul 2020년 6월 1일

MATLAB Online에서 열기

Option 2 is appealing because it always requires exactly n samples from rand and will therefore be consistent in how it affects the Global Stream regardless of the parameters of the distribution.

If you have the Statistics toolbox, another option for the 1D case would be:

X=random(truncate(makedist('Normal'),-1,1),1000,1);

As far as I know the toolbox doesn't provide a built-in capabilty to truncate the mutli-variable normal density.

But that got me wondering if the toolbox uses rejection or the inverse normal approach (or something else), because I'd like to know if the state of the Global Stream depends only on the numel of the output of the call to random. I couldn't find any discussion in the doc about how random works for truncated distributions. So I tried:

>> rng(1);x1=random(makedist('Normal'),5,1);s1=RandStream.getGlobalStream.State;
>> rng(1);x2=random(truncate(makedist('Normal'),-1000,1000),5,1);s2=RandStream.getGlobalStream.State;
>> rng(1);x3=random(truncate(makedist('Normal'),-inf,inf),5,1);s3=RandStream.getGlobalStream.State;
>> [x1 x2 x3]
ans =
   -0.6490   -0.2095   -0.6490
    1.1812    0.5838    1.1812
   -0.7585   -3.6849   -0.7585
   -1.1096   -0.5177   -1.1096
   -0.8456   -1.0504   -0.8456
  
>> [isequal(s1,s2) isequal(s1,s3)]
ans =
  1×2 logical array
   0   1

The Global Stream has a different state after the call for x2 compared to after the call for x1, which indicates that more than 5 draws were needed from the Global Stream. Does that mean that some sort of rejection method is happening? I'm not sure what that rejection would be for a 1000-sigma truncation, I'm just kind of surprised that rejection was needed to the extent that all of x2 is different than x1 (and x3 for that matter). Also, these results suggest that the state of the Global Stream depends on the truncation limits, which is completely understandable if a rejection method is being used, thought it would be helpful if that were documented somewhere.

댓글을 달려면 로그인하십시오.

How to use mvnrnd to generate a normal distribution within +- 1 sigma rather than the whole range (+- 3 sigma)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to use mvnrnd to generate a normal distribution within +- 1 sigma rather than the whole range (+- 3 sigma)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기