Random triplets not asymmetrically distributed
조회 수: 2(최근 30일)
표시 이전 댓글
I would like to generate many triplets whose values are drawn from {0.1, 0.2, ... 0.9}. I also want each triplet to sum to 1.0. I use the following code to generate them. However, when I individually histogram the 1st, 2nd, and 3rd element across all triplets, they are nonuniform and asymmetric.
Can anyone explain why the distribution is nonuniform? I can sort of guess that it is due to the constraint that they sum to 1.0.
However, it seems odd that the distribution is skewed. There should be no difference in the upward direction toward 1.0 or the downward direction toward 0.0, as the problem is completely symmetric.
The only possible cause for asymmetry is that rounding is slightly asymmetric, but just by smidgeon.
nRows=1e7; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0
w = w( all(w,2) , : ); % Discard triplets containing zero
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
for iw = 1:3
subplot(3,1,iw)
histogram( w(:,iw), 0.05:0.1:.95 )
end % for iw
mean(w)
채택된 답변
Walter Roberson
2022년 4월 12일
Please look in the File Exchange to get randfixsum() by Roger Stafford https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum
추가 답변(1개)
John D'Errico
2022년 4월 12일
편집: John D'Errico
2022년 4월 12일
If you wish to sample from the set of triplets that sum to exactly 1, where each elemnt is discrete, then you do NOT want to use randfized sum. In fact, you don't want to do it by rejection either!!!!!!!!
The set of triples that sum to 1 is trivially generated. I will just generate the entire set by brute force. Simplest is like this (Note that I am using integers initially. At the end, I'll divide by 10.) Next, you should see that the MAXIMUM element must ALWAYS be 0.8, NOT 0.9. There is no way to add THREE numbers from the set (1:9)/10, and have one of them be 0.9, with the sum as 1. That should be obvious.
V = 1:8;
[V1,V2,V3] = ndgrid(V,V,V);
V123 = [V1(:),V2(:),V3(:)];
V123(sum(V123,2) ~= 10,:) = [];
size(V123,1)
So there are EXACTLY 36 possible ways to form that sum. No more, no less.
V123
At the end, divide by 10.
V123 = V123/10;
Those are the ONLY ways to form the sum you want.
Now if you want to generate random triples, just generate a random integer from 1 through 36. Use that to sample from the rows of V123.
Nsets = 10000;
ind = randi([1,36],[Nsets,1]);
triples = V123(ind,:);
histogram(triples(:,1),100)
histogram(triples(:,2),100)
histogram(triples(:,3),100)
Now, I suppose that you MIGHT decide this does not make sense, that we should expect to have equally as many samples with .1 as we have .6, .7, or .8. But of course that is silly. In fact, we should expect a non-uniform distribution as we see, for each channel. Think about it.
For example, how many ways are there to get a sum that includes one element that is 0.8? EXACTLY 3 ways, that is... {[0.1 0.1 0.8], [0.1 0.8 0.1], [0.8 0.1 0.1]}.
sum(V123 == 0.8,1)
But now, how many ways are there for the sum to have 0.1 in it?
sum(V123 == 0.1,1)
So there are 8 times as many sums that include the number 0.1 in each member of the triple, than there are ways to find the number 0.8.
참고 항목
범주
Find more on Logical in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!