Random triplets not asymmetrically distributed

조회 수: 2(최근 30일)
FM
FM 2022년 4월 12일
편집: FM 2022년 4월 12일
I would like to generate many triplets whose values are drawn from {0.1, 0.2, ... 0.9}. I also want each triplet to sum to 1.0. I use the following code to generate them. However, when I individually histogram the 1st, 2nd, and 3rd element across all triplets, they are nonuniform and asymmetric.
Can anyone explain why the distribution is nonuniform? I can sort of guess that it is due to the constraint that they sum to 1.0.
However, it seems odd that the distribution is skewed. There should be no difference in the upward direction toward 1.0 or the downward direction toward 0.0, as the problem is completely symmetric.
The only possible cause for asymmetry is that rounding is slightly asymmetric, but just by smidgeon.
nRows=1e7; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0
w = w( all(w,2) , : ); % Discard triplets containing zero
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
for iw = 1:3
subplot(3,1,iw)
histogram( w(:,iw), 0.05:0.1:.95 )
end % for iw
mean(w)
ans = 1×3
0.3259 0.3258 0.3483
  댓글 수: 2
FM
FM 2022년 4월 12일
편집: FM 2022년 4월 12일
Thank you, "the cyclist". My experiments show that you are on the right track, though it's not MATLAB's cleverness so much as my forgetting numerical basics. Here are the troubleshooting steps:
nRows=1e7; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
TripletSums = sum(w,2); % Triplet sums
ifSum1 = TripletSums == 1.0; % Triplet rows that sum to 1.0
% Confirm that rows apparently sum to 1.0 actually do
[ min(TripletSums(ifSum1)) max(TripletSums(ifSum1)) ] % [1 1] yay!
% Check if rows not summing to 1.0 actually don't
diff1 = abs( 1.0 - TripletSums(~ifSum1) ); % Differences from 1.0
[ min(diff1) max(diff1) ] % [0 1] oh no!
What this shows is that some of the rows that apparently don't sum to 1.0 actually do, since the difference is zero. At least one source of this problem is the identification of rows using "ifSum1 = TripletSums==1.0", which probably leaves out rows that are slightly different than 1.0 due to numerical noise. To correct this, note that the decimal tails of the numbers in the triplets contain only 1 digit, so any numerical discrepancy between the sum and 1.0 should be way less than 0.01. The new definition of ifSum1 checks for this:
nRows=1e7; % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
TripletSums = sum(w,2); % Triplet sums
ifSum1 = abs(TripletSums - 1.0) < 0.01 ; % Triplet rows that sum to 1.0
% Confirm that rows apparently sum to 1.0 actually do
[ min(TripletSums(ifSum1)) max(TripletSums(ifSum1)) ] % [1 1] yay!
% Check if rows not summing to 1.0 actually don't
diff1 = abs( 1.0 - TripletSums(~ifSum1) ); % Differences from 1.0
[ min(diff1) max(diff1) ] % [0.1 0.1] i.e. differs from 1.0
w = w(ifSum1,:); % Retain triplets that sum to 1.0
w = w( all(w,2) , : ); % Discard triplets containing zero
mean(w) % [ 0.3333 0.3334 0.3333 ] sanely uniform (yay!)
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
for iw = 1:3
subplot(3,1,iw)
histogram( w(:,iw), 0.05:0.1:.95 )
end % for iw
Unfortunately, the distribution is still skewed.

댓글을 달려면 로그인하십시오.

채택된 답변

Walter Roberson
Walter Roberson 2022년 4월 12일
Please look in the File Exchange to get randfixsum() by Roger Stafford https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum
  댓글 수: 4
FM
FM 2022년 4월 12일
편집: FM 2022년 4월 12일
@John D'Errico + @Bruno Luong : Yes, that's what I got from John's answer. I only browsed the paper in the recommended package, but in that short browse, I did not notice anything about the discrete nature. Thank you for confirming this.

댓글을 달려면 로그인하십시오.

추가 답변(1개)

John D'Errico
John D'Errico 2022년 4월 12일
편집: John D'Errico 2022년 4월 12일
If you wish to sample from the set of triplets that sum to exactly 1, where each elemnt is discrete, then you do NOT want to use randfized sum. In fact, you don't want to do it by rejection either!!!!!!!!
The set of triples that sum to 1 is trivially generated. I will just generate the entire set by brute force. Simplest is like this (Note that I am using integers initially. At the end, I'll divide by 10.) Next, you should see that the MAXIMUM element must ALWAYS be 0.8, NOT 0.9. There is no way to add THREE numbers from the set (1:9)/10, and have one of them be 0.9, with the sum as 1. That should be obvious.
V = 1:8;
[V1,V2,V3] = ndgrid(V,V,V);
V123 = [V1(:),V2(:),V3(:)];
V123(sum(V123,2) ~= 10,:) = [];
size(V123,1)
ans = 36
So there are EXACTLY 36 possible ways to form that sum. No more, no less.
V123
V123 = 36×3
8 1 1 7 2 1 6 3 1 5 4 1 4 5 1 3 6 1 2 7 1 1 8 1 7 1 2 6 2 2
At the end, divide by 10.
V123 = V123/10;
Those are the ONLY ways to form the sum you want.
Now if you want to generate random triples, just generate a random integer from 1 through 36. Use that to sample from the rows of V123.
Nsets = 10000;
ind = randi([1,36],[Nsets,1]);
triples = V123(ind,:);
histogram(triples(:,1),100)
histogram(triples(:,2),100)
histogram(triples(:,3),100)
Now, I suppose that you MIGHT decide this does not make sense, that we should expect to have equally as many samples with .1 as we have .6, .7, or .8. But of course that is silly. In fact, we should expect a non-uniform distribution as we see, for each channel. Think about it.
For example, how many ways are there to get a sum that includes one element that is 0.8? EXACTLY 3 ways, that is... {[0.1 0.1 0.8], [0.1 0.8 0.1], [0.8 0.1 0.1]}.
sum(V123 == 0.8,1)
ans = 1×3
1 1 1
But now, how many ways are there for the sum to have 0.1 in it?
sum(V123 == 0.1,1)
ans = 1×3
8 8 8
So there are 8 times as many sums that include the number 0.1 in each member of the triple, than there are ways to find the number 0.8.
  댓글 수: 3
FM
FM 2022년 4월 12일
Actually, the reason why I wish that both answers can be accepted is that both are informative. One points to a paper that sheds light on the geometric nature of the problem (which I have admittedly yet to plumb). Your answer hits on the discrete and finite nature of the options, providing code that illustrates this.
I really appreciate your explanation of the planar and triangular arrangements. It's so obvious after you explained, but I've been away from basic linear algebra for a long time. It was dead simple to dump the code into MATLAB, create the 3D plot, and grab and rotate the plot, which is very revealing.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by