Looking for something like a matrix version of randsample... [vectorization!]

Question

0 개 추천

If I have a vector w (length n) and I want to pick a single random number between 1 and n using w as the weights, I can do something like

idx = randsample(n,1,true,w)

and I'll get a number between 1 and n with probability w(idx)/sum(w). Great.

Similarly, I have a matrix W (size N x M, where each of N,M is in the thousands or so), and I want to draw M random numbers between 1 and N, with the columns of W acting as independent weight vectors. I could obviously do

idx = zeros(N,1);
for i = 1:M
idx(i) = randsample(N,1,true,W(:,i));
end

...but I'm going to be calling this literally billions of times, so I'm looking for some efficiency.

I know that an equivalent way to think of this is to take my W matrix, normalize the columns so that they sum to one, do a cumsum on the columns, select a vector of uniform random numbers using rand(1,M), and find the first row indices where they are greater than the cumsum values, but I don't know how to do that without using a loop and find():

W_normalized = bsxfun(@rdivide,W,sum(W,1));
W_cdf = cumsum(W,1);
x = rand(1,M);
C = bsxfun(@lt,x,W_cdf);

and then the first row of each column of C with a 1 in it is my random number, but I haven't had any luck doing that in an efficient, vectorized way (I've seen this thread, but I think their conclusion to use a for-loop doesn't really seem to hold for larger matrices).

Any suggestions?

Thanks, Dan

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Kirby Fears 2015년 11월 25일

편집: Kirby Fears 2015년 11월 30일

MATLAB Online에서 열기

1 개 추천

Here's a pure matrix version of your bsxfun calls. It should be internally parallelized if your matrices are large.

W_normalized = W./repmat(sum(W,1),size(W,1),1);
W_cdf = cumsum(W_normalized,1);
x = rand(1,size(W,2));
C = repmat(x,size(W,1),1)<W_cdf;

You still need to find the first "true" value in C for each column to form an index.

Below I make a logical array where only the first "true" is present in each column:

idx = [C(1,:); xor(C(2:end,:),C(1:end-1,:))];

Hope this helps.

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

Dan Gianotti 2017년 3월 8일

I just realized that I never "accepted" this answer. Thanks very much for the huge assistance! You presumably saved me a hundred years of CPU time.

Kirby Fears 2017년 3월 9일

No problem. I'm glad it worked out.

댓글을 달려면 로그인하십시오.

Looking for something like a matrix version of randsample... [vectorization!]

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

추가 답변 (0개)

카테고리

태그

Community Treasure Hunt

Looking for something like a matrix version of randsample... [vectorization!]

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 4 이전 댓글 2개 표시 이전 댓글 2개 숨기기

추가 답변 (0개)

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기