Random triplets not asymmetrically distributed

Question

FM 2022년 4월 12일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1694255-random-triplets-not-asymmetrically-distributed

편집: FM 2022년 4월 12일

I would like to generate many triplets whose values are drawn from {0.1, 0.2, ... 0.9}. I also want each triplet to sum to 1.0. I use the following code to generate them. However, when I individually histogram the 1st, 2nd, and 3rd element across all triplets, they are nonuniform and asymmetric.

Can anyone explain why the distribution is nonuniform? I can sort of guess that it is due to the constraint that they sum to 1.0.

However, it seems odd that the distribution is skewed. There should be no difference in the upward direction toward 1.0 or the downward direction toward 0.0, as the problem is completely symmetric.

The only possible cause for asymmetry is that rounding is slightly asymmetric, but just by smidgeon.

nRows=1e7; % Number of triplets

w = rand(nRows,3); % Generate row triplets

w = w./sum(w,2); % Normalize each triplet to sum to 1.0

w = round(w,1); % Round to 1 decimal place

w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0

w = w( all(w,2) , : ); % Discard triplets containing zero

% Individually histogram 1st, 2nd, and 3rd element across all

% triplets

for iw = 1:3

subplot(3,1,iw)

histogram( w(:,iw), 0.05:0.1:.95 )

end % for iw

mean(w)

ans = 1×3

0.3259 0.3258 0.3483

댓글 수: 2
없음 표시없음 숨기기

the cyclist 2022년 4월 12일

편집: the cyclist 2022년 4월 12일

MATLAB Online에서 열기

I ran your code in the interface here.

What is puzzling to me is not the skewness of the distributions themselves, but rather the fact that the distribution of the third column is different from the 1st and 2nd columns. Note that I appended a calculation of the mean value of each column, and they are not equal. (It's not just sampling error.)

Maybe I'm overlooking something obvious, but I see no point in the code in which the columns of w are not treated identically from a coding perspective. Superficially, I feel that this algorithm should work.

The line that causes the problem is the one with the sum:

w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0

If you shuffle the values within each row after that point, you will get the expected result:

nRows=1e6; % Number of triplets

w = rand(nRows,3); % Generate row triplets

w = w./sum(w,2); % Normalize each triplet to sum to 1.0

w = round(w,1); % Round to 1 decimal place

w = w( sum(w,2) == 1.0 , : ); % Retain triplets that sum to 1.0

for nr = 1:size(w,1)

w(nr,:) = w(nr,randsample(3,3));

end

w = w( all(w,2) , : ); % Discard triplets containing zero

% Individually histogram 1st, 2nd, and 3rd element across all

% triplets

figure

for iw = 1:3

subplot(3,1,iw)

histogram( w(:,iw), -0.05:0.1:1.05 )

end % for iw

mean(w)

ans = 1×3

0.3336 0.3332 0.3332

I speculate that MATLAB is doing something clever when evaluating

sum(w,2) == 1.0

that leads to w(:,3) being the "odd man out".

FM 2022년 4월 12일

편집: FM 2022년 4월 12일

MATLAB Online에서 열기

Thank you, "the cyclist". My experiments show that you are on the right track, though it's not MATLAB's cleverness so much as my forgetting numerical basics. Here are the troubleshooting steps:

nRows=1e7;  % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
TripletSums = sum(w,2); % Triplet sums
ifSum1 = TripletSums == 1.0; % Triplet rows that sum to 1.0
% Confirm that rows apparently sum to 1.0 actually do
[ min(TripletSums(ifSum1)) max(TripletSums(ifSum1)) ] % [1 1] yay!
% Check if rows not summing to 1.0 actually don't
diff1 = abs( 1.0 - TripletSums(~ifSum1) ); % Differences from 1.0
[ min(diff1) max(diff1) ] % [0 1] oh no!

What this shows is that some of the rows that apparently don't sum to 1.0 actually do, since the difference is zero. At least one source of this problem is the identification of rows using "ifSum1 = TripletSums==1.0", which probably leaves out rows that are slightly different than 1.0 due to numerical noise. To correct this, note that the decimal tails of the numbers in the triplets contain only 1 digit, so any numerical discrepancy between the sum and 1.0 should be way less than 0.01. The new definition of ifSum1 checks for this:

nRows=1e7;  % Number of triplets
w = rand(nRows,3); % Generate row triplets
w = w./sum(w,2); % Normalize each triplet to sum to 1.0
w = round(w,1); % Round to 1 decimal place
TripletSums = sum(w,2); % Triplet sums
ifSum1 = abs(TripletSums - 1.0) < 0.01 ; % Triplet rows that sum to 1.0
% Confirm that rows apparently sum to 1.0 actually do
[ min(TripletSums(ifSum1)) max(TripletSums(ifSum1)) ] % [1 1] yay!
% Check if rows not summing to 1.0 actually don't
diff1 = abs( 1.0 - TripletSums(~ifSum1) ); % Differences from 1.0
[ min(diff1) max(diff1) ] % [0.1 0.1] i.e. differs from 1.0
w = w(ifSum1,:); % Retain triplets that sum to 1.0
w = w( all(w,2) , : ); % Discard triplets containing zero
mean(w) % [ 0.3333  0.3334  0.3333 ] sanely uniform (yay!)
% Individually histogram 1st, 2nd, and 3rd element across all
% triplets
for iw = 1:3
   subplot(3,1,iw)
   histogram( w(:,iw), 0.05:0.1:.95 )
end % for iw

Unfortunately, the distribution is still skewed.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Walter Roberson 2022년 4월 12일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1694255-random-triplets-not-asymmetrically-distributed#answer_940625

Please look in the File Exchange to get randfixsum() by Roger Stafford https://www.mathworks.com/matlabcentral/fileexchange/9700-random-vectors-with-fixed-sum

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

Bruno Luong 2022년 4월 12일

This answer is not suitable for discrete case, which is entirely different beast if uniformity is required.

FM 2022년 4월 12일

편집: FM 2022년 4월 12일

@John D'Errico + @Bruno Luong : Yes, that's what I got from John's answer. I only browsed the paper in the recommended package, but in that short browse, I did not notice anything about the discrete nature. Thank you for confirming this.

댓글을 달려면 로그인하십시오.

Answer 2

John D'Errico 2022년 4월 12일

3
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1694255-random-triplets-not-asymmetrically-distributed#answer_940675

편집: John D'Errico 2022년 4월 12일

MATLAB Online에서 열기

If you wish to sample from the set of triplets that sum to exactly 1, where each elemnt is discrete, then you do NOT want to use randfized sum. In fact, you don't want to do it by rejection either!!!!!!!!

The set of triples that sum to 1 is trivially generated. I will just generate the entire set by brute force. Simplest is like this (Note that I am using integers initially. At the end, I'll divide by 10.) Next, you should see that the MAXIMUM element must ALWAYS be 0.8, NOT 0.9. There is no way to add THREE numbers from the set (1:9)/10, and have one of them be 0.9, with the sum as 1. That should be obvious.

V = 1:8;
[V1,V2,V3] = ndgrid(V,V,V);
V123 = [V1(:),V2(:),V3(:)];
V123(sum(V123,2) ~= 10,:) = [];
size(V123,1)
ans = 36

So there are EXACTLY 36 possible ways to form that sum. No more, no less.

V123
V123 = 36×3
   1     1
   2     1
   3     1
   4     1
   5     1
   6     1
   7     1
   8     1
   1     2
   2     2

At the end, divide by 10.

V123 = V123/10;

Those are the ONLY ways to form the sum you want.

Now if you want to generate random triples, just generate a random integer from 1 through 36. Use that to sample from the rows of V123.

Nsets = 10000;

ind = randi([1,36],[Nsets,1]);

triples = V123(ind,:);

histogram(triples(:,1),100)

histogram(triples(:,2),100)

histogram(triples(:,3),100)

Now, I suppose that you MIGHT decide this does not make sense, that we should expect to have equally as many samples with .1 as we have .6, .7, or .8. But of course that is silly. In fact, we should expect a non-uniform distribution as we see, for each channel. Think about it.

For example, how many ways are there to get a sum that includes one element that is 0.8? EXACTLY 3 ways, that is... {[0.1 0.1 0.8], [0.1 0.8 0.1], [0.8 0.1 0.1]}.

sum(V123 == 0.8,1)
ans = 1×3
     1     1     1

But now, how many ways are there for the sum to have 0.1 in it?

sum(V123 == 0.1,1)
ans = 1×3
     8     8     8

So there are 8 times as many sums that include the number 0.1 in each member of the triple, than there are ways to find the number 0.8.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

FM 2022년 4월 12일

편집: FM 2022년 4월 12일

Thank you, John D'Errico. I wish there was a way to designate two answers. Your explanation makes perfect sense. In fact, I initially started down a similar path by selecting the first scalar number of a single triplet to be an integer between 1 to 8, then choosing the 2nd one such that it would sum with the first one to no more the 9. The 3rd component wasn't even random, due to need to sum to 10.

That generated just one triplet, which is fine (I can repeat the process). But that gave a uniform distribution to the 1st of 3 components, and identical declining distributions to 2nd & 3rd components. Of course it makes sense because of what you described. The first number has all the latitude, and the 2nd number takes on lower number more often because of crowding from the 1st number. The final number has an identical distribution because it merely compliments the 2nd number to sum up 0.9 minus the 1st number.

I thought that I wanted more uniform distributions, both across the 3 components and within any one component, which led to the method in my posted question. Not surprisingly, it didn't yield uniform distributions at all. I wasn't taking your discrete combinatorial approach. Given your model, however, I suspect that there is no way to make the distributions flatter by selectively choosing certain combinations more than others.

Thanks again.

John D'Errico 2022년 4월 12일

MATLAB Online에서 열기

(Um, you can change which answer you have accepted. But i really don't care that much about being accepted. Reputation is worth nothing to me. When I die, I hope my obituary has something better to say about me than quote my reputation on a web site.)

People misunderstand what uniform would mean in a case like this.

The constraint that the numbers sum to 1 implies all sets of points MUST lie in a plane. And since they are all positive, the constraint region becomes a triangle.

V = 1:8;

[V1,V2,V3] = ndgrid(V,V,V);

V123 = [V1(:),V2(:),V3(:)];

V123(sum(V123,2) ~= 10,:) = [];

plot3(V123(:,1),V123(:,2),V123(:,3),'o')

axis equal

box on

grid on

Do you see that all points lie in a planar triangle? How about if I rotate the plot?

plot3(V123(:,1),V123(:,2),V123(:,3),'o')

axis equal

box on

grid on

view(-24,53)

The points all lie in a triangle. They are uniformly scattered in the triangle. But that means if you look at marginals, so looking at one variable only, the histogram you see will NOT be uniform.

FM 2022년 4월 12일

Actually, the reason why I wish that both answers can be accepted is that both are informative. One points to a paper that sheds light on the geometric nature of the problem (which I have admittedly yet to plumb). Your answer hits on the discrete and finite nature of the options, providing code that illustrates this.

I really appreciate your explanation of the planar and triangular arrangements. It's so obvious after you explained, but I've been away from basic linear algebra for a long time. It was dead simple to dump the code into MATLAB, create the 3D plot, and grab and rotate the plot, which is very revealing.

댓글을 달려면 로그인하십시오.

Random triplets not asymmetrically distributed

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (1개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

Random triplets not asymmetrically distributed

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시이전 댓글 2개 숨기기

추가 답변 (1개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 4
이전 댓글 2개 표시이전 댓글 2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기