Parallelizing multiple iterations and randperm for a Monte Carlo simulation

조회 수: 7 (최근 30일)
I have a function which randomly picks stocks to buy, subject to the constraint of a fixed number of holding periods and fixed number of stocks that can be bought at any one time. The actual function is as follows:
function profit=monteCarloSim(stockPrices,numHoldingPeriods,numStocks,numSimuls)
[rows,cols] = size(stockPrices);
stockReturns = diff(stockPrices);
profit = zeros(numSimuls,1);
parfor i=1:numSimuls
buyWeights = zeros(rows-1,cols);
buySignal = randperm(rows-1,numHoldingPeriods);
buyRandomAssetPicks = randperm(cols, numStocks);
buyWeights(buySignal,buyRandomAssetPicks)=1/numStocks;
portfolioReturns = sum(stockReturns.*buyWeights,2);
profit(i) = sum(portfolioReturns);
end
end
I was planning to limit the number of operations in the loop by assigning each iteration of buyWeights to a 3D matrix. This way, I would take the portfolioReturns and profits(i) steps out of the loop, and completely vectorize them.
However, this approach becomes complicated when I have a huge number of simulations and a large data set, because there's not enough memory to store the whole 3D matrix. So I have to come up with a few extra steps to manage memory, e.g. build smaller chunks of the 3D matrix iteratively.
Even if I had the memory to do the 3D matrix approach, I'll still have to iterate the randperm function. I can't find a function to repeat this more effectively and I'm still a novice at using bsxfun.
Is there a way to speed this function up? I also have a good number of GPU cores, if a gpuArray method helps.
Thanks in advance!

채택된 답변

Jan
Jan 2012년 9월 25일
편집: Jan 2012년 9월 25일
You could use the faster FEX: Shuffle instead of RANDPERM. But unfortunately this function is not threadsafe currently and the replied random numbers will be equal in each thread.
Allocating buyWeights repeatedly consumes time. Better set it to zero:
buyWeights = zeros(rows-1,cols);
parfor i=1:numSimuls
buyWeights(:) = 0;
...
But here the PARFOR is a problem also. I cannot test this currently, but I guess this could help for both problems:
block = round(linspace(1, numSimuls + 1, numberOfCores + 1));
parfor iCore = 1:numberOfCores
buyWeights = zeros(rows-1,cols);
Shuffle(rand(1,4) * 2^32, 'seed');
for iS = block(iCore):block(iCore + 1) - 1
buyWeights(:) = 0;
buySignal = Shuffle(rows-1, 'index', numHoldingPeriods);
buyRandomAssetPicks = Shuffle(cols, 'index', numStocks);
buyWeights(buySignal,buyRandomAssetPicks) = 1/numStocks;
profit(iS) = sum(stockReturns(:) .* buyWeights(:));
end
end
In addition it is avoided to calculate the sum over the 2nd dimension at first, because this means accessing elementes spread through wide sections of the RAM. Better create the sum over the first dimension at first:
b = sum(stockReturns .* buyWeights, 1);
profit(iS) = sum(b);
Or join it to one SUM command as in the example above.
NOTE: Please test and debug this, because it is written from the scratch in a text editor.
  댓글 수: 3
Edric Ellis
Edric Ellis 2012년 9월 25일
Unfortunately, Jan's PARFOR loop no longer follows the rules of what's allowed in terms of indexing. Results from PARFOR must either be reduction variables like this:
x = 0;
parfor ii=1:7
x = x + rand();
end
or 'sliced' outputs where the variable is indexed using the loop variable, like this:
x = zeros(1,7);
parfor ii=1:7
x(ii) = rand();
end
To fix Jan's code, we need to work around the PARFOR constraints, like so:
parfor iCore = 1:matlabpool('size') % this gets the number of workers
profitTmp = zeros(1, block(iCore+1) - block(iCore));
for iS = ...
...
profitTmp(iS) = sum ...
end
profit{iCore} = profitTmp;
end
profit = [ profit{:} ]; % might need some dimension fixing here.
where we have ensured that 'profit' is indexed correctly for a PARFOR output ('sliced').
I would also suggest you might wish to chunk the work up into more blocks than the number of workers if there is any possibility of load imbalance between the blocks.
Ephedyn
Ephedyn 2012년 9월 25일
Awesome, I understand how it works now. This ran wonderfully. In case someone else needs the solution on this Q&A, here's the extra (minor) debugging to get it running:
numberOfCores = matlabpool('size');
profit = cell(numberOfCores,1);
...
parfor iCore = 1:numberOfCores
indexOffset = block(iCore) - 1;
...
for iS = block(iCore):block(iCore + 1) - 1
profitTmp(iS-indexOffset) = sum(stockReturns(:) .* weights(:), 1);
end
profit{iCore} = profitTmp;
end
profit = [profit{:}]';
Cheers!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Edric Ellis
Edric Ellis 2012년 9월 25일
If you take computations out of the PARFOR loop, then even if they are vectorized, they are no longer operating in parallel so it may not be faster. The two cases that you are considering both appear to involve transferring much more data from outside the PARFOR loop to be used inside it. This again will add overhead. Consider simply the 'profit(i)' calculation. Currently, the PARFOR loop only has to return a numSimuls-by-1 array. If you could return 'portfolioReturns' for summation outside the loop, you'd need to return numSimuls-by-(rows-1) elements. This extra data transfer might easily outweigh the benefit of performing a single call to SUM.
  댓글 수: 1
Ephedyn
Ephedyn 2012년 9월 25일
Ah, got it.
P.S.: Noticed you're on the development team for the parallel computing toolbox. Want to thank you guys a lot for the work, it's been extremely useful and put together in an intuitive way.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by