Random sample without replacement

조회 수: 17 (최근 30일)
Qinpeng Wang
Qinpeng Wang 2012년 1월 30일
댓글: Michele Baldo 2020년 12월 24일
Hi,all,
Does anybody know how to do random sample without replacement? The randsample function in matlab only supports sampling with replacement.
I made codes on my own, and it is really weird sometimes it works, but sometimes it shows error (Error using ==> randsample at 94 W must have length equal to N.):
function C=randsample_WithoutReplacement(m,n,A1,A2)
%A1:population
%A2:probability
B=zeros(m,1);
C=zeros(n,m);
s=transpose(1:1:length(A1));
ut=0;
loc=0;
A=A2;
for j=1:n
A=A2;
s=transpose(1:1:length(A1));
for i=1:m
B(i)=randsample(s,1,true,A);
[ut, loc] = ismember(B(i), s);
s(loc)=[];
A(loc)=[];
end
for i=1:m
C(j,i)=A1(B(i));
end
end
  댓글 수: 3
Andrew Newell
Andrew Newell 2012년 1월 30일
Oh - you mean it doesn't support weighted sampling without replacement.
Andrew Newell
Andrew Newell 2012년 1월 31일
I can't reproduce your error.

댓글을 달려면 로그인하십시오.

채택된 답변

Andrew Newell
Andrew Newell 2012년 1월 31일
Here is a recursive function that will return one row of your matrix C:
function y = randsampleWithoutReplacement(population,k,w)
y = [];
if ~isempty(population) && k > 0
n = length(population);
ii = randsample(1:n,1,true,w);
newpop = setdiff(1:n,ii);
y = [population(ii) randsampleWithoutReplacement(population(newpop),k-1,w(newpop))];
end
and here is an example of a call:
y = randsampleWithoutReplacement(1:100,20,ones(100,1)/100)
EDIT: And here is one that is closer to your version:
function C=randsample_WithoutReplacement(A1,k,A2)
%A1:population
%A2:probability
C=zeros(1,k);
A=A2;
n = length(A1);
for i=1:k
loc=randsample(n-i+1,1,true,A);
A(loc)=[];
C(i)=A1(loc);
end
However, I have designed it to act more like randsample.
  댓글 수: 3
Michele Baldo
Michele Baldo 2020년 12월 24일
There is a bug in the second version, "A1(loc)=[];" is missing after "C(i)=A1(loc);"
Michele Baldo
Michele Baldo 2020년 12월 24일
I would write it as
function y = randsampleWithoutReplacement(population, k, w)
y = zeros(1, k);
n = length(population);
for i = 1:k
x = randsample(n-i+1, 1, true, w);
y(i) = population(x);
population(x) = [];
w(x) = [];
end
end
but now there is also the function "datasample" that can do this
datasample(population, k, 'Weights', w, 'Replace',false)
As it was said in another answer, if you don't use weights you could use "randsample" that it does sampling without replacement by default.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Peter Perkins
Peter Perkins 2012년 1월 31일
As Andrew pointed out, randsample absolutely does do sampling without replacement, just not with weights. It looks like that's what you're asking for.
If you have access to R2011b, you can use the new datasample function in the Statistics Toolbox (a replacement for randsample, though randsample continues to work) for sampling with and without replacement, weighted or unweighted:
  댓글 수: 1
Qinpeng Wang
Qinpeng Wang 2012년 1월 31일
Thanks, I'll also look into the R2011b new function.

댓글을 달려면 로그인하십시오.


Derek O'Connor
Derek O'Connor 2012년 1월 31일
If you don't have access to R2011b and randsample, then the function below is reasonably fast on my Dell Precision 690, 2.33GHz, 16GB ram, Windows 7 Professional, Matlab R2008b 64-bit.
It uses a rejection loop to call DiscITBS, which generates a single sample from a discrete distribution by doing a binary search on the CDF, which, by definition, is sorted in ascending order.
Membership in S is tested by the byte-array member. This is a bit expensive (of memory) but is fast and simple. If you have lots of memory, then use it.
The expected value of the running time is Ns*Ew*log(Np), where Ew = E(nw) is the expected number of trips around the rejection loop.
If Np is small then it doesn't matter what method you use. If Np = 10^6, and Ns < 0.25*Np then this method is quite fast because Ew will be small and log(Np) of binary search takes care of the large Np.
For example:
with Np = 10^6 and Ns = 10^3, nw = 3 and t = 0.05 secs.
with Np = 10^6 and Ns = 10^4, nw = 129 and t = 0.23 secs.
% -------------------------------------------------------------
function [S,nw] = DiscSampRej(x,p,Ns);
% -------------------------------------------------------------
% Generate a random sample of size Ns from x(1:Np) with prob
% p(1:Np), without replacement. Derek O'Connor 31 Jan 2012
% -------------------------------------------------------------
S = zeros(1,Ns);
Np = length(x);
member(1:Np) = false;
cdf = cumsum(p);
nw=0;
for k = 1:Ns
idx = DiscITBS(cdf);
while member(idx)
idx = DiscITBS(cdf);nw=nw+1;
end
S(k) = x(idx);
member(idx) = true;
end % function
% -------------------------------------------------------------
function idx = DiscITBS(cdf);
% -------------------------------------------------------------
% Uses the discrete Inverse Transform method with Binary Search
% This greatly reduces the number of iterations of the while-loop
% Time Complexity: O(log n)
u = rand;
L = 1; H = length(cdf);
while L <= H
m = floor(L/2+H/2);
if u < cdf(m)
H = m-1;
else
L = m+1;
end
end
idx = m;
% end function

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by