Generation of Conditional Random Variables

Question

Conrado Neto 2021년 1월 3일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables

편집: Paul 2021년 2월 3일

채택된 답변: Bruno Luong

I am updating my question as it seems before it was a bit confusing

I hope this is clear now (in case it is not clear, please ask and I will clarify):

Assume I have 16 random variables X1,..,X16 and 16 constant numbers C1,..,C16.

I want to generate samples of (X1+...+X16), considering that X1,..,X16 are random variables (1 to 8 are normally distributed and 9 to 16 are weibull) and the generated samples must always respect the following constraints:

X1/(X1+ ... +X16)=C1

...

X16/(X1 + ... + X16)=C16

where C1,...C16 are constant values and are known prior to the generation of the RVs.

and C1+...+C16 is always 1 and always positive numbers).

Thanks a lot for any suggestions

댓글 수: 2
없음 표시없음 숨기기

Cris LaPierre 2021년 1월 3일

What do you mean by respecting these weights?

Conrado Neto 2021년 1월 3일

Thanks for reading my problem

I mean that I need to generate more samples for all those random variables that have those proportions between themselves. Different new samples, same proportion between all the generated samples

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Bruno Luong 2021년 1월 9일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#answer_594088

편집: Bruno Luong 2021년 1월 9일

MATLAB Online에서 열기

I don't quite understand what you want but from your request you seem to compute

X1 = C1 * S
X2 = C2 * S
...
X16 = C16 * S

With S being the sum of X_i. Therefore you just need to generate S (since all C are known).

If you know the pdf of X_i and the in principle you would be able to compute the theoretical pdf of S.

For example if X_i are independent and normal distribution with the same standard deviation of sigma, then S is normal distribution with standard deviation of sigma*sqrt(16).

Then all you need is generate randomly S from its pdf, then apply the above scalling then your are done.

If you are not able to compute the theoretical pdf of sum of mixed normal and weibull (admidly not trivial), the sum converges toward the normal distribution with

meanS := mean(S) = sum(mean(X_i))
stdS := std(S) = sqrt(sum(std(X_i)^2))

IMO if you generate S as

S = meanS + stdS*randn()

then

X = C * S

C being the array of 16 known values, it would well fit your need.

댓글 수: 34
이전 댓글 32개 표시이전 댓글 32개 숨기기

Bruno Luong 2021년 1월 10일

편집: Bruno Luong 2021년 1월 10일

MATLAB Online에서 열기

Here is the demo code, adapt to your need. The not well defined partt is the range to discretized the pdf of s (I suppose if you assume all the pdf of X are Gaussian, then you could well estimate the range from the mean and standard deviation of X, similar to the central limit theorem.)

Gausspdf = @(x,mu,sigma) 1./(sigma*sqrt(2*pi))*exp(-(x-mu).^2./(2*sigma^2));
WeiBullpdf = @(x,k,lambda) (k./lambda).*(x./lambda).^(k-1).*exp(-(x/lambda).^k);
% pdf parameters
nG = 8; nW = 8;
mu = 1; sigma = 0.5;
lambda = 1; k = 1.5;
pdftab = [repmat({@(x) Gausspdf(x,mu,sigma)}, [1, nG]) ...
          repmat({@(x) WeiBullpdf(x,k,lambda)}, [1, nW])];
% Generate a test (known) C table
n = length(pdftab);
C = rand(1,n);
C = C/sum(C);
% joint probability function, computed at s*C
spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));
% Discretize s, make sure the range has a significant distribution
% contribution by inspecting the plot
% The range depends on the parameters used to build pdftab and C
smin = 1;
smax = 25;
ns = 1000;
s = linspace(smin,smax,ns).';
smid = 0.5*(s(1:end-1)+s(2:end));
% plot see where is the distribution is significant
spdf = arrayfun(spdf_fun,smid);
plot(smid,spdf);
xlabel('s');
ylabel('spdf');
% Generate s from the discretized pdf
scdf = [0; cumsum(spdf)];
scdf = scdf / scdf(end);
[scdf,I] = unique(scdf);
s = interp1(scdf, s(I), rand(), 'pchip')
% Here is X vector
X = s*C 

Paul 2021년 1월 10일

Ok. I think I see what you're doing. I think it would be clearer to normalize spdf rather than scdf, but I guess at the end of the day the same X will result.

So I think the process is going like this: define the joint pdf of the Xi. Then take the "slice" out of the joint PDF that lies along the line M*X = 0 as parameterized by s. That slice, after normalization, defines a condtional pdf, from which samples of S are generated (which is the variable s, in s = interp1 ...), and then finally generate samples of X from S (X = S.*C). As you noted, the answer as posted doesn't normalize the slice but rather normalzes the CDF.

After thinking more about your procedure, I realized that my answer, as posted, might actually be a solution the problem (see my comment above). However, I compared your result to mine for the parameters in my problem (2 normal, 2 Weibull)and different parameters for the RVs), and came up with different distributions for the columns of X. So I'm still confused (and it could have been a mistake on my part in implementing your code). Any idea what's different about my answer compared to yours?

Is spdf_fun (after normalizatoin) supposed to be the same as f_y1giveny2y3y4 in my answer? They look like they have simular shapes but the peak of the former is to the left of the peak of the latter, and I did notice that spdf_fun can take on negative values for s < 0.

When the slice is taken out of joint pdf, does it account for the fact that the joint pdf of the Xi doesn't have support over the entirety of R^n (because the Weibull variables only have support for x > 0)?

Bruno Luong 2021년 1월 11일

편집: Bruno Luong 2021년 1월 11일

MATLAB Online에서 열기

There was a BUG in my script with histogram (I overwrite s), here is a correct one

Gausspdf = @(x,mu,sigma) 1./(sigma*sqrt(2*pi))*exp(-(x-mu).^2./(2*sigma.^2));
WeiBullpdf = @(x,k,lambda) (k./lambda).*(x./lambda).^(k-1).*exp(-(x/lambda).^k);
% pdf parameters
nG = 8;
mu = rand(1,nG); sigma = 1+rand(1,nG);
nW = 8;
k = 1+rand(1,nW); lambda=1+rand(1,nW);
extendfun = @(p,n) p + zeros(1,n);
mu      = extendfun(mu,nG);
sigma   = extendfun(sigma,nG);
k       = extendfun(k,nW);
lambda  = extendfun(lambda,nW);
pdftab = [arrayfun(@(mu,sigma) @(x) Gausspdf(x,mu,sigma), mu, sigma, 'Unif', 0) ...
          arrayfun(@(k,lambda) @(x) WeiBullpdf(x,k,lambda), k, lambda, 'Unif', 0) ...
          ];
% Generate a test (known) C table
n = length(pdftab);
C = rand(1,n);
%C = [1 0.3 0.5 0.7];
C = C/sum(C);
% joint probability function
spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));
% Discretize s, make sure the range has a significant distribution
% contribution with the plot
% The range depends on the parameters used to build pdftab
smin = 0;
smax = sqrt(n)*8;
ns = 1000;
s = linspace(smin,smax,ns).';
smid = 0.5*(s(1:end-1)+s(2:end));
% plot see where is the distribution is significant
spdf = arrayfun(spdf_fun,smid);
% Normalize, just for the plot
ds = mean(diff(s));
a = 1/(sum(spdf)*ds);
spdf = a*spdf;
plot(smid,spdf);
hold on
PlotHist = n <= 4; % For large n, it converg too slow
if PlotHist
    % Check with histogram
    nbins = 32;
    m = 1e7; % number of sample
    X = zeros(m,n);
    sz = [1 m];
    for k=1:n
        distk = pdftab{k};
        info = functions(distk).workspace{1};
        funname = func2str(distk);
        if contains(funname, 'Gausspdf')
            X(:,k) = GaussianRNG(info.mu, info.sigma, sz);
        elseif contains(funname, 'WeiBullpdf')
            X(:,k) = WeibullRNG(info.k, info.lambda, sz);
        end
    end
    acceptance_ratio = 1e-3;
    sumX = sum(X,2); % EDIT bug correction
    P = null(C); % orthogonal of span C
    XL2Residue = sum((X*P).^2,2);
    [~,accepted] = mink(XL2Residue, ceil(m*acceptance_ratio));
    histogram(sumX(accepted), 32, 'Normalization', 'pdf'); % EDIT bug correction
end
xlabel('s = \Sigma X_i');
ylabel('pdf(s)');
grid on
% Generate s from the discretized pdf
scdf = [0; cumsum(spdf)];
scdf = scdf / scdf(end);
[scdf,I] = unique(scdf);
s = interp1(scdf, s(I), rand(1000,1), 'pchip');
% Here is X vector
X = s*C
%%
function  x=WeibullRNG(k,lambda,sz)
x = lambda.*( -log(1-rand(sz))).^(1/k);
end
%%
function  x=GaussianRNG(mu,sigma,sz)
x = mu + sigma*randn(sz);
end

Paul 2021년 1월 17일

Bruno,

Does this solution extend to the case where we remove the restrictions that Ci > 0 for the ratios involving the normal RVs?

For example, if X1 and X2 are normal and X3 and X4 are Weibull, does the same solution apply if:

C = [ -0.2 0.3 0.4 0.5]

Bruno Luong 2021년 1월 17일

I think so. In the logic we never use the fact that C >= 0.

The parametrization migh extend to an unbounded interval. Numerically we just need to be careful and truncate at the places where the pdf is considered as neglegible.

댓글을 달려면 로그인하십시오.

Answer 2

Jeff Miller 2021년 1월 3일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#answer_590053

편집: Jeff Miller 2021년 1월 3일

It sounds like the only "free" aspect of your new samples is the new total (say, 241 instead of 240). Once you have that new total, your 16 individual random scores are in the pre-determined proportions of that new total.

So, generate new random totals in the same way as the original--i.e., as the sum of 16 separate random scores--and then compute the 16 individual new random variables as the pre-determined proportions of the new total.

Note that the 16 random variables generated in this way will no longer be normals and weibulls, however, since they are fixed proportions of a (total) random variable that is neither normal nor weibull.

By the way, you can speed up the generation of the total of 8 independent normal random variables, since their total is a single normal with the sum of the means and the sum of the variances.

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

Jeff Miller 2021년 1월 3일

MATLAB Online에서 열기

Well, I don't really understand why you are trying to generate random numbers with these weights, so I don't think I can clarify much.

As I understood your problem, the first number should always be exactly 5/240 of the total, the second should always be exactly 25/240 of the total, and so on. With that constraint, these numbers are highly dependent--for example, the second is always exactly 5 times the first, and so on. In fact, to make a new sample, another way to proceed is to just generate the first one randomly (e.g., from a normal), and then compute each of the others from that. E.g.,

w = (5/240, 25/240, 12/240, and so on);
x(1) = randn;  % with whatever mu and sigma you want
x(2) = w(2)/w(1) * x(1);
x(3) = w(3)/w(1) * x(1);
% etc

In this case the distributions of all x's would be normal, however.

The point is that there really isn't much random here if you regard these weights as fixed. You just can't independently generate separate random values using the original proportions.

Jeff Miller 2021년 1월 3일

편집: Jeff Miller 2021년 1월 4일

MATLAB Online에서 열기

Maybe it is helpful to look in more detail at the 2-variable case to understand the problem a bit better:

% Just generate some independent random numbers & compute sums.
simN = 500000;
expMu = 1;
expvals = exprnd(expMu,simN,1);  % random variable 1 is exponential with mean 1
norMu = 1;
norSigma = 1;
norvals = normrnd(norMu,norSigma,simN,1); % rv 2 is normal(1,1)
sums = expvals + norvals;
% Let's say we are interested in simulating pairs of the RVs
% where the exponential is 0.5 of the total.  We can look at suche
% pairs "retrospectively" by picking out relevant pairs from the
% independent simulations above.
targetExpFrac = 0.5;
tolerance = 0.01;  % because the proportion won't be exactly 1/2
ExpFrac = expvals ./ sums;  % compute actual exponential fraction in each pair.
% Now find the pairs that have the right exponential fraction:
accepted = (ExpFrac >= targetExpFrac - tolerance) & (ExpFrac <= targetExpFrac + tolerance);
% Let's look at the distributions of all scores versus the score
% where the proportions are as desired:
figure;
subplot(3,2,1);
histogram(expvals);
title('All exponentials');
subplot(3,2,2);
histogram(expvals(accepted));
title('Exponentials in samples with desired fraction');
subplot(3,2,3);
histogram(norvals);
title('All normals');
subplot(3,2,4);
histogram(norvals(accepted));
title('Normals in samples with desired fraction');
subplot(3,2,5);
histogram(sums);
title('All sums');
subplot(3,2,6);
histogram(sums(accepted));
title('Sums in samples with desired fraction');
% Now, how can we sample randomly from pairs with the right proportions:
acceptedexpvals = expvals(accepted);
acceptednorvals = norvals(accepted);
% sample one position randomly and take the exp & norm from that position.

As you can see, the distributions of exponential in the samples with the desired fraction is quite different from the distribution of all exponentials. And likewise for the normals and for the sums. So, you are right to worry that the constraint changes the distribution of totals.

Unfortunately, I don't see any short-cuts. In your case, you would have to generate many millions of sets of 16 variables independently, and find that very small fraction of them which happened to have just the proportions you wanted across all rvs.

Jeff Miller 2021년 1월 4일

I am pretty sure that there is an easier way to do this. The key is that once you know the value of one rv, you can compute the values of the other rvs from the known weights. So, you only need to generate one rv randomly.

Let's say you want to generate random values of the first normal rv, from which you can compute the rest. As you have pointed out, the conditional distribution of this first rv is not actually normal because of the weighting constraint. What is its distribution?

To simplify the terminology, I'll pretend the rvs are all discrete so I can talk about probabilities.

As an example, what is pr(rv1=5), given the constraint? Well, rv1=5 under the constraint only when it is also the case that rv2...rv16 all equal their constrained values consistent with rv1=5. Call those values rv2c, rv3c,....

Now all these rvs are independent so the joint probability is the product of their individual probabilities. Thus, pr(rv1=5|constraint) = pr(rv1=5)*pr(rv2=rv2c)*pr(rv3=rv3c)...

In principle you can do the analogous calculation for all of the different values of rv1, and thus get the distribution of rv1 under the constraint. Once you have that, you can randomly sample rv1's from the constrained distribution and compute the corresponding values of the other rvs from each random rv1.

I must admit that I am not confident about how to extend all this to continuous rvs. Maybe it is enough to compute a joint likelihood by multiplying pdfs and applying some normalization at the end, but maybe it is trickier than that.

Conrado Neto 2021년 1월 6일

Indeed, I agree that knowing that one "conditional distribution" of a single rv would solve all my problems.

and I also dont know how to obtain that and I was hoping someone might know

in any case, you are right

if you happen to have any idea, let me know

I did some search on this and the topic started to get really complicated so I wasnt able to fully get it

댓글을 달려면 로그인하십시오.

Answer 3

Paul 2021년 1월 8일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#answer_593443

편집: Paul 2021년 1월 13일

MATLAB Online에서 열기

Based on this comment, the problem is as follows:

Let Xi be a set of independent, continuous random variables, i = 1-n.

Let wi be a set of weights, i = 2-n. The goal is find samples of Xi under the constraints Xi/sum(Xi) = wi, i = 2-n.

To solve this problem:

1. transform to a different set of random varibles defined as follows: Y1 = sum(Xi), Yi = Xi/sum(Xi) i = 2-n

2. find the joint pdf of Yi

3. Find the conditional pdf of Y1 given Yi = wi, i = 2-n.

4. sample Y1 from its conditional pdf

5. generate samples of Xi from Y1 and the constraint equations.

For example assume:

X1 ~ N(0,1)

X2 ~ N(1,2)

X3 ~ W(1,1)

X4 ~ W(2,1)

% Step 0
% define the distribution parameters
mu1 = 0; sigma1 = 1;
mu2 = 1; sigma2 = 2;
a3 = 1;  b3 =     1;
a4 = 1;  b4 =     2;
% define the pdfs of the random variables
syms x1 x2 real
syms x3 x4 real
f_x1(x1) = 1/sigma1/sqrt(2*pi)*exp(-(x1-mu1)^2/(2*sigma1^2)); % -inf < x1 < inf
f_x2(x2) = 1/sigma2/sqrt(2*pi)*exp(-(x2-mu2)^2/(2*sigma2^2)); % -inf < x2 < inf
f_x3(x3) = piecewise(x3 <= 0, 0, x3 > 0, (b3/a3)*((x3/a3)^(b3-1))*exp(-((x3/a3)^b3)));       %    -inf < x3 < inf
f_x4(x4) = piecewise(x4 <= 0, 0, x4 > 0, (b4/a4)*((x4/a4)^(b4-1))*exp(-((x4/a4)^b4)));       %    -inf < x4 < inf
% define the joint pdf of Xi
f_x1x2x3x4(x1,x2,x3,x4) = f_x1(x1)*f_x2(x2)*f_x3(x3)*f_x4(x4);
% Step 1
% define the transformation of RVs
syms y1 y2 y3 y4 real
g1 = y1 == x1 + x2 + x3 + x4;
g2 = y2 == x2/(x1 + x2 + x3 + x4);
g3 = y3 == x3/(x1 + x2 + x3 + x4);
g4 = y4 == x4/(x1 + x2 + x3 + x4);
% Step 2
% solve for the inverse equations
h = solve([g1,g2,g3,g4],x1,x2,x3,x4,'Real',true,'ReturnConditions',true);
% find the Jacobian of the inverse solution
J(y1,y2,y3,y4) = jacobian([h.x1, h.x2, h.x3, h.x4],[y1, y2, y3, y4]);
% determinant of J
detJ(y1,y2,y3,y4) = det(J(y1,y2,y3,y4));
% joint density of Yi (edit 12 Jan 2021, math should be abs(detJ).  Did not change results.
f_y1y2y3y4(y1,y2,y3,y4) = f_x1x2x3x4(h.x1,h.x2,h.x3,h.x4)*abs(detJ(y1,y2,y3,y4));
% Step 3
% The conditional density of Y1 given Yi = yi is the joint density divided by the marginal density
% marginal density of Yi, i = 2-4
f_y2y3y4(y2,y3,y4) = int(f_y1y2y3y4(y1,y2,y3,y4),y1,-inf,inf);
% conditional density of Y1
f_y1giveny2y3y4(y1,y2,y3,y4) = f_y1y2y3y4(y1,y2,y3,y4)/f_y2y3y4(y2,y3,y4);
% verify conditional density using Jeff Miller's acceptance approach (https://www.mathworks.com/matlabcentral/answers/707693-generation-of-conditional-random-variables#comment_1242578)
simN = 5000000;
X1 = random('Normal',mu1,sigma1,[simN 1]);
X2 = random('Normal',mu2,sigma2,[simN 1]);
X3 = random('Weibull',a3,b3,[simN 1]);
X4 = random('Weibull',a4,b4,[simN 1]); 
w2 = 0.3;
w3 = 0.5;
w4 = 0.7;
Y1 = X1 + X2 + X3 + X4;
Y2 = X2./(X1 + X2 + X3 + X4);
Y3 = X3./(X1 + X2 + X3 + X4);
Y4 = X4./(X1 + X2 + X3 + X4);
tolerance = 0.1;
accepted = abs(Y2 - w2) < tolerance & abs(Y3 - w3) < tolerance & abs(Y4 - w4) < tolerance;
figure;
subplot(311);
histogram(Y1(accepted),'Normalization','pdf');
hold on
fplot(f_y1giveny2y3y4(y1,w2,w3,w4),[0 5],'r','LineWidth',4)
title('f(y1 | y2,y3,y4), Analytic and Histogram (Acceptance)')
% Step 4
% sample Y1 using inverse sampling (numerical solution)
syms z real
f(z) = vpa(f_y1giveny2y3y4(z,w2,w3,w4));
Cdf_y1(y1) = int(f(z),-inf,y1);
y1vec = 0:.001:5; % Cdf_y1(0) = 0, Cdf_y1(5) = 0.999998034
Cvec = double(Cdf_y1(y1vec));
usamp = rand(simN,1);
Y1samp = interp1(Cvec,y1vec,usamp);
Y2samp = w2;
Y3samp = w3;
Y4samp = w4;
subplot(312)
histogram(Y1samp,'Normalization','pdf');
hold on
fplot(f_y1giveny2y3y4(y1,w2,w3,w4),[0 5],'r','LineWidth',4)
title('f(y1 | y2,y3,y4), Analytic and Histogram (Direct Inverse Sampling)')
% Step 5
% generate samples of Xi from Yi and the constraint equations
h1 = matlabFunction(h.x1);
X1samp = h1(Y1samp,Y2samp,Y3samp,Y4samp);
X2samp = w2*Y1samp;
X3samp = w3*Y1samp;
X4samp = w4*Y1samp;
subplot(313)
histogram(X1samp,'Normalization','pdf');
title('Conditional PDF of X1');

Here are the results. Seemed to work pretty well for this example. Note that Step 5 runs very fast once Steps 1-4 complete.

I don't know how well this approach scales as the number of RVs increases. There may be other gotcha's as well.

댓글 수: 18
이전 댓글 16개 표시이전 댓글 16개 숨기기

Conrado Neto 2021년 1월 8일

Hi Paul

amazing answer, that seems to be exactly what i want to do

I am going to play with your code a bit and then I will have to change my accepted answer to yours because it seems you nailed it

In my case there are 16 RVs in total. I am also unsure if there would be other gotcha's because of that but I dont think so

I have a couple other questions, the first one is a very basic one:

In my case I want the sum of the weights to be 1, so I am not sure why you wrote:

g1 = y1 == x1 + x2 + x3 + x4;

I believe it should be

g1 = y1 == x1/(x1 + x2 + x3 + x4);

like the other RVs. Unless there is a basic logic operation there that I am missing.

In any case, the problem shouldnt have its formulation changed much considering that that is different.

Now the second one question, since you clearly understand my problem.

In the end, I just need to generate the samples of 'sum(x1+x2+x3+x4+..+x16)' respecting: 'x1/(x1+..+x16)+x2/(x1+..+x16)+...+x16/(x1+..+x16)=1'

Does this simplify the problem or is it still the best approach to generate the variables for each RV and then sum them?

thanks a lot you already helped me a lot with your answer

Paul 2021년 1월 10일

편집: Paul 2021년 1월 10일

I'm not yet convinced that the problem is well-posed. What we have are RVs Xi. As I showed above, the joint density of the Xi exists. It's just the product of the individual densities due to the independence of the Xi. Then we have RVs Yi defined as:

Yi = Xi/sum(Xi).

and finally we have an RV defined as:

S = sum(Xi)

What you're looking to do is generate samples of S conditioned on Yi taking on specified values. Those samples would come from the conditional density of S, i.e., the density of S conditioned on the Yi = Ci.

What is that conditional desnsity of S?

It is exists it would be the joint density of S and Yi divided by the marginal density of Yi. So to solve the problem we need to find these densities.

What's the marginal density of Yi? If it exists, it's just the joint density of Yi as derived from the joint density of Xi. At first glance, one might think to use the change of variables approach as done above to establish the joint density of Yi. But there's a catch. I'm nearly certain that a necessary condition for the joint density of Yi to exist is that the mapping from the Xi to the Yi be one to one. In this situation that's not the case, since Yi = 1/n for any x where Xi = x. So I don't think the joint density of Yi is defined.

More problems ensue in trying to find the joint density of S and Yi.

So if (and I'm not sure it's true) these joint densities do not exist, is the problem well posed?

Paul 2021년 1월 10일

A. Before talking about Yi as a constant, I'm viewing Yi as RVs because they are defined as functions of the Xi. In other words, I'm viewing the problem in two steps. First define the RVs Yi, and then apply the conditions.

B. In my problem, I did not include Y1 = X1/(X1 + X2 + X3 + X4) as an RV because I misunderstood the problem. I thought the problelm only had n-1 constraints, when, in fact, it has n constraints.

C. The difference between the problem I solved and the real problem is that the problem I solved has only n-1 constraints; the real problem has n. Again, that was my misunderstanding of what you were asking.

I suspect the (or at least one) reason you're having trouble adapting my solution to your problem is related to the details about the mapping between the Xi and the Yi.

I have one other concern. Suppose the Yi = Xi/sum(Xi) are, in fact, continous random variables (it sure seems like they are; can Yi can take on an uncountable number of values before applying any constraints?). Then the probability that Yi = Ci for i = 1,n is zero, by the definiton of a continuous random variable.

Full disclosure: I think there are some subtleties in this problem that are getting close to or beyond my level of knowledge in this subject.

Paul 2021년 1월 12일

편집: Paul 2021년 2월 3일

They did not sum to 1 because when I wrote the answer I did not realize that was a constraint. However, I've changed the values to where the do sum to 1 and it didn't change the result.

Edit: Actually, they did sum to 1, with w1 implicitly being set to -0.5, because we must have sum(w) = 1.

Paul 2021년 2월 3일

편집: Paul 2021년 2월 3일

MATLAB Online에서 열기

Here is updated code that is simpler and more flexible.

mu = [0 1]; sigma = [1 2];
a = [1 1]; b = [1 2];
% define the constants.
C = [ 0.1; 0.2; 0.3; 0.4];  % column vector
nG = numel(mu);
nW = numel(a);
C = C(:)./sum(C);
%Ccell = num2cell(C);
% number of samples
simN = 5000000;
xvar = sym('x',[1 nG+nW],'real');
yvar = sym('y',[1 nG+nW],'real');
syms pdf_normal(x,muparam,sigmaparam)
pdf_normal(x,muparam,sigmaparam)= 1/sigmaparam/sqrt(2*pi)*exp(-(x-muparam)^2/(2*sigmaparam^2)); % -inf < x1 < inf
syms pdf_weibull(x,aparam,bparam)
pdf_weibull(x,aparam,bparam) = piecewise(x < 0, 0, x >= 0, (bparam/aparam)*((x/aparam)^(bparam-1))*exp(-((x/aparam)^bparam)));
% define the joint pdf of Xi
syms jointpdf_X real
jointpdf_X = 1;
for ii = 1:nG
    jointpdf_X = jointpdf_X*pdf_normal(xvar(ii),mu(ii),sigma(ii));
end
for ii = 1:nW
    jointpdf_X = jointpdf_X*pdf_weibull(xvar(ii+nG),a(ii),b(ii));
end
jointpdf_X = piecewise(sum(xvar)==0,0,jointpdf_X); % need to exclude the line sum(xi) = 0
jointpdf_X(xvar) = jointpdf_X;
% determinant of Jacobian
detJac(yvar) = yvar(1)^(nG+nW-1);
% joint density of Y, evaluated at Xi = Y1*Ci
Xi = num2cell(yvar(1)*C);
jointpdf_Y(yvar(1)) = simplify(jointpdf_X(Xi{:})*abs(detJac));
jointpdf_Y(yvar(1)) = vpa(jointpdf_Y);
% marginal density of Y2-Yn, evaluated at Xi = Y1*Ci, i = 2-n
marginalpdf_Y = vpaintegral(jointpdf_Y(yvar(1)),yvar(1),-inf,inf);
% conditional density of Y1 (only valid when the marginal density ~= 0)
conditionalpdf_Y1(yvar(1)) = jointpdf_Y/marginalpdf_Y;
conditionalpdf_Y1 = vpa(conditionalpdf_Y1);
% Step 4
% sample Y1 using inverse sampling (numerical solution)
syms z real
f(z)            = vpa(conditionalpdf_Y1(z));
Cdf_y1(yvar(1)) = int(f(z),-inf,yvar(1));
% rough estimate of the maximum practical value of S = sum(Xi) (assumes S>0)
maxY1 = 0;
for ii = 1:nG
    maxY1 = maxY1 + (mu(ii)+3*sigma(ii));
end
for ii = 1:nW
    maxY1 = maxY1 + icdf('Weibull',0.999,a(ii),b(ii));
end
y1vec       = 0:.01:maxY1; 
% Cdfvec      = double(Cdf_y1(y1vec));
Cdfvec      = cumtrapz(y1vec,double(conditionalpdf_Y1(y1vec)));
[Cdfvec,ii] = unique(Cdfvec);
y1vec       = y1vec(ii);
if abs(1-Cdfvec(end)) > 1e-5
    error('CDF(end) ~= 1');
end
Y1 = interp1(Cdfvec,y1vec,rand(simN,1));

This code yields the same results as in the original answer. It works for the original question as defined with Ci > 0, and also works for other admissible values of Ci, i.e., sum(Ci) = 1 and the Ci associated with the Weibull variables all have to have the same sign, as long as the determination of maxY1 is properly modified for the case where Y1 = sum(Xi) is negative. But the important term is conditionalpdf_Y1, which I believe comes out correctly for the case where Y1 < 0 (admittedly having not tested this case extensively, but it seemed to work like it should).

It turns out that this solution is very similar to Bruno's solution in this comment. In fact, this term

jointpdf_X(Xi{:})

is exactly the same as this term

spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));

in Bruno's code. The difference between the solutions is that this solution mutiplies jointpdf_X by the term

abs(detJac)) % == abs(y1^(nG+nW-1))

This extra term comes from the transformation of the differential volume in X space to the the differential volume in Y space, which is needed to define the joint pdf in Y space, which is needed to have before applying the conditioning to get the conditional pdf. At least in my understanding of these types of problems.

This solution can provide some insight into the symbolic form of the conditional pdf. If only a numerical answer is desired using this solution, it's probably easier to avoid the symbolic stuff and use Bruno's code but modfiy one line of code:

% spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C));
spdf_fun = @(s) prod(arrayfun(@(f,x) feval(f{1},x), pdftab, s*C)).*abs(s.^(nG+nW-1));

with the caveat that, as Bruno noted, Bruno's code will need minor updates to generalize to cases where one or more of Ci < 0 is allowed, should such generalization be desired.

The conditonal pdf in this solution will be proportional to

y1^(nG+nW-1+sum(b~=1))

which can be a pretty large exponent. For the orginal question it will be in the range [15 23] depending on the b parameters of the Weibull variables. So there may be some computational issues associated with raising numbers to a large power in intermeidate computations depending on the RV parameters if using only the numerical solution.

댓글을 달려면 로그인하십시오.

Generation of Conditional Random Variables

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 34
이전 댓글 32개 표시이전 댓글 32개 숨기기

추가 답변 (2개)

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

댓글 수: 18
이전 댓글 16개 표시이전 댓글 16개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Generation of Conditional Random Variables

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 34 이전 댓글 32개 표시이전 댓글 32개 숨기기

추가 답변 (2개)

댓글 수: 12 이전 댓글 10개 표시이전 댓글 10개 숨기기

댓글 수: 18 이전 댓글 16개 표시이전 댓글 16개 숨기기

참고 항목

카테고리

태그

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 34
이전 댓글 32개 표시이전 댓글 32개 숨기기

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

댓글 수: 18
이전 댓글 16개 표시이전 댓글 16개 숨기기