randperm non uniformly distributed
조회 수: 2 (최근 30일)
이전 댓글 표시
I want to sample from integers 1 through 56 without replacement. Neither randperm nor datasample with 'Replacement',false give a uniformly distributed set if I iterate many times. Why is the last bin in the histogram double the size of the the rest?
perms=zeros(10000,6);
samps=zeros(10000,6);
[rp, cp]=size(perms);
for p=1:rp
permstemp = randperm(56,6);
perms(p,:)=permstemp;
end
[rs, cs]=size(samps);
for s=1:rs
sampstemp = datasample(1:56,6,'Replace',false);
samps(s,:)=sampstemp;
end
histogram(perms(1:end))
histogram(samps(1:end))
댓글 수: 0
채택된 답변
John D'Errico
2019년 8월 15일
Sigh. This is NOT a question of non-uniformity. Just a question of not understanding how to recognize non-uniformity, and partially how to understand a histogram.
If you create a histogram with too few bins, what happens is there will be SOME bins that have multiple counts in those bins.
It turns out that histogram decided to use bin edges of 1:56 here, so the last bin got used for twice as many samples.
Note the difference between these two calls to histogram:
histogram(perms(1:end))
histogram(perms(1:end),1:56)
histogram(perms(1:end),1:57)
The first two produce the same results. So it appears the default for the bin edges was 1:56. However, when I gave it another bin up to 57, all things appear normal.
So what happens when I have bin edges 1:56? There are integer events at 56, and some at 55. So that last bin had all events that were either 55 OR 56 in the bin. Whereas bin number 1 only had the events that were strictly a 1. When I get it one more bin to use for the histogram, things were now fine.
So before you claim non-uniformity, think about whether the test you are using that asserts non-uniformity might be flawed.
댓글 수: 3
Steven Lord
2019년 8월 15일
John is correct. As stated in the histogram documentation page, "Each bin includes the left edge, but does not include the right edge, except for the last bin which includes both edges."
Before John added that last bin edge at 57, the last bin was [55, 56] and the next-to-last bin was [54, 55). So the last bin counted two distinct values from the data.
After John added that last bin edge at 57, the last bin is [56, 57] and the next-to-last bin is [55, 56). Each of the last two bins now counts only one distinct value from the data.
추가 답변 (1개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Distribution Plots에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!