# How to generate correlated samples from a single distribution? Each sample should be correlated with the previous.

조회 수: 5(최근 30일)
Semih Gonen 2021년 3월 30일
댓글: Jeff Miller 2021년 4월 2일
---------------------------------------------------------------
EDIT
I realize that my previous description of the problem may not be clear. Therefore, I am adding the information below.
My problem is as follows:
I have a 10x10 stone wall. Each stone piece is 1x1, so there are 100 pieces in total. Starting from the bottom, these stones are numbered 1-100 in ascending order. For two different material properties of the stones, I assigned a normal and lognormal distribution (I am using different distributions so that the solution is not specific to a normal distribution only). These two properties are (or can be) independent of each other. Now, I want to assign a material property (from the distributions I defined) for each stone and it should be correlated with the previous (or next) stone's property with a specified correlation coefficient (Let's say rho = 0.7). The values assigned to the stones should not be sorted, i.e., we do not want the low values at the bottom part and high values at the top to be grouped.
I hope my question is clear now. Thank you for your help.
---------------------------------------------------------------
I would like sample N points from a distribution but I want each sample generated at the current step to be correlated with the previous sample point. So, there should be a specified correlation between the n'th and (n+1)'th points, (n+1)'th and (n+2)'th points, and so on. I don't know how to correlate the subsequent values as the Pearson's correlation coefficient is 1 for two scalar values, i.e., corrcoef(value(n), value(n+1)) = 1.
Any help is much appreciated.
Here is some information for the sake of an example:
% I define the mu and sigma parameters for a normal and lognormal
% distribution first
% parameters for the normal distribution
mu_phi = 35; sigma_phi= 0.25*mu_phi;
% parameters for the lognormal distribution
mu_ft = 0.25; sigma_ft = 0.45*mu_ft;
Nmu = mu_ft; Nsigma = sigma_ft;
lnsigma_ft = sqrt( log( Nsigma^2 / Nmu^2 + 1) );
lnmu_ft = log( Nmu/exp(0.5*lnsigma_ft^2) );
% generate the distributions
pd_phi = makedist('Normal','mu',mu_phi,'sigma',sigma_phi);
pd_ft = makedist('Lognormal','mu',lnmu_ft,'sigma',lnsigma_ft);
% correlation coefficient
rho = 0.7;
% generating the samples
for i = 1:100
normal_sample = random(pd_phi,1);
lognormal_sample = random(pd_ft,1);
....
% I need the samples from each distribution be correlated with the
% previous sample.
% (I do not need the samples from different distributions to be
% correlated)
% any help is much appreciated!
The second issue is the distribution of the sampled points. I guess a high correlation between subsequent samples would result in a deviation from the specified distribution. However, this concern is secondary.
##### 댓글 수: 1표시숨기기 없음
Paul 2021년 3월 31일
The question is still not clear to me. Let's assume there is a single property. Let P1 be the continuous random variable for that property of the first stone and let P2 be for the second stone. The correlation coefficent of P1 and P2 falls out from their joint density. So in order to get the desired samples, we first need a definition of the joint density of P1 and P2 that yields their desired means and variances and the correlation coefficient between them.

댓글을 달려면 로그인하십시오.

### 채택된 답변

Jeff Miller 2021년 3월 31일
If you don't like the sorting solution, then I think you must consider the joint and conditional distribution of two successive observations as Paul suggested. For the normal property with mu=0 and sigma=1, the solution would look something like this:
sampleN = 100;
rho1 = 0.7;
property1 = nan(sampleN,1);
property1 = randn;
for i=2:sampleN
property1(i) = randn * (1-rho1^2) + rho1*property1(i-1);
end
But keep in mind that the standard deviation of these property1 scores will be less than the sigma=1 of the original distribution, due to rho>0. (As an extreme case, note that rho=1 would give a zero standard deviation of the property1 scores.)
For the lognormal property2, you could use the same technique and then exponentiate the scores at the end, but you would have to adjust the rho2 value to give you the desired correlation of the exponentiated scores).
##### 댓글 수: 2표시숨기기 이전 댓글 수: 1
Jeff Miller 2021년 4월 2일
yes, sqrt is correct

댓글을 달려면 로그인하십시오.

### 추가 답변(2개)

Jeff Miller 2021년 3월 30일
If you don't care about the order of the data points, one option that seems to fulfill your stated requirements is the following. If this is not what you want, maybe at least the xcorr function will be useful.
sample_n = 100;
x = random(pd_phi,sample_n);
x = sort(x); % putting the samples in order ensures they are correlated
errsd = 0.01; % increase this value to reduce the correlation (e.g. 0.2), but this
% will generally produce deviations from the specified
% distribution
err = randn(1,sample_n) * errsd;
x = x + err;
r = xcorr(x,1,'normalized') % r(1) is the correlation between adjacent x values
##### 댓글 수: 1표시숨기기 없음
Semih Gonen 2021년 3월 31일
Sorting creates a problem for me. Sorry not to have mentioned this before.
I edited the question, would you mind taking another look?

댓글을 달려면 로그인하십시오.

Paul 2021년 3월 31일
편집: Paul 2021년 3월 31일
I don't think the questionn is well posed. The question and the code make it sound like there is a single distribution and the samples are i.i.d, but then it states there is a correlation which suggests the underlying RVs are not independnet (at least I don't think they are). This answer is based on what I think the question really is.
It sounds like there is supposed be a joint density function of N random variables. After getting a sample for the first RV, find the conditional density of the remaining RVs given the vaue of the first sample. From that conditional density, get a sample of next RV, recondiiton, and so on. I think this approach will work for the normal distribution case as long as a valid joint density is defined at the outset (I'm not familiar with the lognormal and so can't comment on that). For example, suppose the goal is to generate three samples and each RV has a correltation coefficient of 0.7 with it's succesor and the first and third are uncorrelated. So the joint normal distribution is defined by a 3-element mean, mu13, and a 3x3 covariance (assuming sigma1 = sigma2 = sigma3 = sigma = 1 (for example)
Sigma13 =
1.000000000000000 0.700000000000000 0
0.700000000000000 1.000000000000000 0.700000000000000
0 0.700000000000000 1.000000000000000
>> eig(Sigma13) % verify Sigma is postive definite
ans =
0.010050506338833
0.999999999999999
1.989949493661167
The marginal density of the RV X1 is normal(mu3(1),Sigma(1,1)). Generate a sample of X1 using
x1 = normrnd(mu3(1),sqrt(Sigma(1,1)));
With this value of x1, compute mean (mu23) and the covariance (Sigma23) of X2,X3 conditioned on X1 = x1. This Wikepedia page shows how to do that from mu13, Sigma13 and x1.
The conditional density of X2 given X1 = x1 is normal(mu23(1),Sigma23(1,1)). Generate a sample of x2 using normrnd, recondition, and you're left with normal distribution of X3 conditioned on X1 = x1 and X2 = x2. Generate a sample of x3 based on this distribution.
I don't know if this is really what you want. If it is, it should scale to more than three normal RVs as long as a valid joint density is defined at the outset.
Having said all that, I'm not even sure that the above procedure is different than just calling mvnrnd once:
x = mvnrnd(mu13,Sigma13)
##### 댓글 수: 1표시숨기기 없음
Semih Gonen 2021년 3월 31일
I thought there is a single distribution to sample from, but your solution might work for a joint density function of N random variables if it is possible for each variable to have the same marginal distribution (I may be wrong and confused).
I tried to write the question in a more clear way, would you mind answering it again?

댓글을 달려면 로그인하십시오.

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!