Looking for something like a matrix version of randsample... [vectorization!]

If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like
idx = randsample(n,1,true,w)
and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.
Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do
idx = zeros(N,1);
for i = 1:M
idx(i) = randsample(N,1,true,W(:,i));
end
...but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.
I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():
W_normalized = bsxfun(@rdivide,W,sum(W,1));
W_cdf = cumsum(W,1);
x = rand(1,M);
C = bsxfun(@lt,x,W_cdf);
and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).
Any suggestions?
Thanks, Dan

 채택된 답변

Kirby Fears
Kirby Fears 2015년 11월 25일
편집: Kirby Fears 2015년 11월 30일
Here's a pure matrix version of your bsxfun calls. It should be internally parallelized if your matrices are large.
W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;
You still need to find the first "true" value in C for each column to form an index.
Below I make a logical array where only the first "true" is present in each column:
idx = [C(1,:); xor(C(2:end,:),C(1:end-1,:))];
Hope this helps.

댓글 수: 4

Thanks for the response, Kirby. I hadn't thought of diff as a solution to this -- I like it.
I'm interested in one step further -- how do I then extract the row indices with 1s in them from each column? With find, perhaps, in row/column form? I'm looking for a vector of integers (like a vector of outputs from randsample).
Thanks, Dan
You can take the x location from find() output. I'm actually switching out diff() for xor() below to avoid the double array output from diff().
W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;
idx = [C(1,:) ; xor(C(2:end,:),C(1:end-1))];
[idxLocation,~] = find(idx);
I just realized that I never "accepted" this answer. Thanks very much for the huge assistance! You presumably saved me a hundred years of CPU time.
No problem. I'm glad it worked out.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Resizing and Reshaping Matrices에 대해 자세히 알아보기

질문:

2015년 11월 25일

댓글:

2017년 3월 9일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by