Surprising behavior in randsample

Question

0 개 추천

I recently got surprised by the behavior of randsample.

1. Generating sequences with replacement -- not surprising

When I generate sequences with replacement (after setting the same seed), the first N values generated are the same, regardless of how many values I generate:

seed = 13;
N = 12;
for ni = 1:N
    rng(seed)
    fprintf("randsample (with replace), %2d value(s): ",ni); fprintf('%g ', randsample(N,ni,true)'); fprintf("\n");
end
randsample (with replace),  1 value(s): 10 
randsample (with replace),  2 value(s): 10 3 
randsample (with replace),  3 value(s): 10 3 10 
randsample (with replace),  4 value(s): 10 3 10 12 
randsample (with replace),  5 value(s): 10 3 10 12 12 
randsample (with replace),  6 value(s): 10 3 10 12 12 6 
randsample (with replace),  7 value(s): 10 3 10 12 12 6 8 
randsample (with replace),  8 value(s): 10 3 10 12 12 6 8 10 
randsample (with replace),  9 value(s): 10 3 10 12 12 6 8 10 8 
randsample (with replace), 10 value(s): 10 3 10 12 12 6 8 10 8 9 
randsample (with replace), 11 value(s): 10 3 10 12 12 6 8 10 8 9 1 
randsample (with replace), 12 value(s): 10 3 10 12 12 6 8 10 8 9 1 4 

2. Generating sequences without replacement -- surprising

When I generate sequences without replacement (after setting the same seed), I expected the same behavior. And that is the behavior -- but only if the sequence is long enough. For shorter sequences, the values are not in the same order.

seed = 13;
N = 12;
for ni = 1:N
    rng(seed)
    fprintf("randsample (without replace), %2d value(s): ",ni); fprintf('%g ', randsample(N,ni,false)'); fprintf("\n");
end
randsample (without replace),  1 value(s): 10 
randsample (without replace),  2 value(s): 3 10 
randsample (without replace),  3 value(s): 10 12 3 
randsample (without replace),  4 value(s): 11 2 12 6 
randsample (without replace),  5 value(s): 11 2 12 6 7 
randsample (without replace),  6 value(s): 11 2 12 6 7 9 
randsample (without replace),  7 value(s): 11 2 12 6 7 9 10 
randsample (without replace),  8 value(s): 11 2 12 6 7 9 10 8 
randsample (without replace),  9 value(s): 11 2 12 6 7 9 10 8 1 
randsample (without replace), 10 value(s): 11 2 12 6 7 9 10 8 1 3 
randsample (without replace), 11 value(s): 11 2 12 6 7 9 10 8 1 3 4 
randsample (without replace), 12 value(s): 11 2 12 6 7 9 10 8 1 3 4 5 

Notice how the first three rows don't follow the pattern. This seems odd to me, and perhaps buggy. (The behavior is consistent, and doesn't depend on the particular seed.)

I'm not sure I have a question, other than ... "Does this seem strange to anyone else?"

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Paul 2023년 2월 26일

이동: the cyclist 2023년 2월 26일

MATLAB Online에서 열기

1 개 추천

I don't see anything in the doc that says anything about the ordering. randsample is an .m file. The algorithm for without replacement changes when 4*k > n, consistent withthe results shown for randsample.

datasample works the same with replacement, but much differently without replacement.

datasample with replacement is the same as randsample.

seed = 13;
N = 12;
for ni = 1:N
    rng(seed)
    fprintf("datasample (with replace), %2d value(s): ",ni); fprintf('%g ', datasample(1:N,ni,'Replace',true)'); fprintf("\n");
end
datasample (with replace),  1 value(s): 10 
datasample (with replace),  2 value(s): 10 3 
datasample (with replace),  3 value(s): 10 3 10 
datasample (with replace),  4 value(s): 10 3 10 12 
datasample (with replace),  5 value(s): 10 3 10 12 12 
datasample (with replace),  6 value(s): 10 3 10 12 12 6 
datasample (with replace),  7 value(s): 10 3 10 12 12 6 8 
datasample (with replace),  8 value(s): 10 3 10 12 12 6 8 10 
datasample (with replace),  9 value(s): 10 3 10 12 12 6 8 10 8 
datasample (with replace), 10 value(s): 10 3 10 12 12 6 8 10 8 9 
datasample (with replace), 11 value(s): 10 3 10 12 12 6 8 10 8 9 1 
datasample (with replace), 12 value(s): 10 3 10 12 12 6 8 10 8 9 1 4 

But datasample without replacement is ...

seed = 13;
N = 12;
for ni = 1:N
    rng(seed)
    fprintf("datasample (without replace), %2d value(s): ",ni); fprintf('%g ', datasample(1:N,ni,'Replace',false)'); fprintf("\n");
end
datasample (without replace),  1 value(s): 10 
datasample (without replace),  2 value(s): 10 3 
datasample (without replace),  3 value(s): 5 12 9 
datasample (without replace),  4 value(s): 4 11 8 12 
datasample (without replace),  5 value(s): 3 10 6 12 11 
datasample (without replace),  6 value(s): 3 12 5 11 10 9 
datasample (without replace),  7 value(s): 2 10 4 9 8 11 7 
datasample (without replace),  8 value(s): 2 10 4 9 8 12 7 11 
datasample (without replace),  9 value(s): 2 9 4 12 7 11 6 10 8 
datasample (without replace), 10 value(s): 1 8 3 12 6 10 5 9 7 11 
datasample (without replace), 11 value(s): 1 7 12 11 5 9 4 8 6 10 2 
datasample (without replace), 12 value(s): 12 2 1 7 8 6 10 11 4 9 5 3 

댓글 수: 2
없음 표시 없음 숨기기

the cyclist 2023년 2월 26일

이동: Image Analyst 2023년 2월 26일

Nice catch on the algorithmic change! That at least explains the behavior, and confirms that it is not a bug (in the software development sense).

I'm not sure I love it, though.

Walter Roberson 2023년 2월 26일

이동: the cyclist 2023년 2월 26일

The algorithm for without replacement changes when 4*k > n

IIRC that is the point that the Fisher-Yates Shuffle stops being used.

That is, although the FY is pretty efficient, when you are asking to generate most or all of the available locations, then at some point it becomes more efficient to use the sort(rand()) algorithm, I gather.

댓글을 달려면 로그인하십시오.

Surprising behavior in randsample

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2
없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

Community Treasure Hunt

Surprising behavior in randsample

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 2 없음 표시 없음 숨기기

추가 답변 (0개)

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 2
없음 표시 없음 숨기기