필터 지우기
필터 지우기

limiting broadcast variable in a 3d block processing program

조회 수: 12 (최근 30일)
tiwwexx
tiwwexx 2022년 10월 4일
답변: Tejas 2024년 8월 20일 5:17
I have some 3D or 4D arrays that I want to run a block processing on. To do this I create a cell object that I can call that returns matricies of position values.
For example, ind_object{1} = [1 2 3 ; 1 2 3; 1 2 3].
I then call this index object in a parfor loop to pull out blocks of my image (sz_block(n) is the size of the block, on this example [3,3,3])
parfor n=1:numel(index_object)
block_im = pad_im(index_object{n}(1,1:sz_block(1)),index_object{n}(2,1:sz_block(2)),index_object{n}(3,1:sz_block(3)),:);
out(n) = run_some_function(block_im);
end
The problem with this is that my "pad_im" array ends up being a broadcast variable and it's rather large 512x512x300 so the memory swapping really slows down the calculation.
Does anyone have a reccomendation to get rid of the broadcasting of the pad_im array?
Or does anyone know how other programs (like C/C++ or python) do n-dimensional block processing?
Thanks for the info!

답변 (1개)

Tejas
Tejas 2024년 8월 20일 5:17
Hello Tiwwexx,
To stop the broadcasting of the pad_im array, consider making it a Distributed Array. This will partition the data so it can be utilized across multiple available workers. A 'spmd(Single Program Multiple Data) block can then be used to enable parallel execution of code across multiple workers. Each worker will run the same code independently but will process different parts of the data. Please note that this approach requires the Parallel Computing Toolbox.
Below is a comparison of the time taken by two methods: the first uses a parfor loop, and the second uses spmd.
Using_parfor_loop.m
dim1 = 512;
dim2 = 512;
dim3 = 300;
pad_im = rand(dim1, dim2, dim3);
sz_block = [3, 3, 3];
% Create an index object for block processing
index_object = cell(1, 10);
for i = 1:10
index_object{i} = [randi([1, dim1-sz_block(1)+1], 1, 3);
randi([1, dim2-sz_block(2)+1], 1, 3);
randi([1, dim3-sz_block(3)+1], 1, 3)];
end
out = zeros(1, numel(index_object));
tic;
parfor n = 1:numel(index_object)
% Extract block from pad_im using the indices from index_object
block_im = pad_im(index_object{n}(1,1):index_object{n}(1,1)+sz_block(1)-1, ...
index_object{n}(2,1):index_object{n}(2,1)+sz_block(2)-1, ...
index_object{n}(3,1):index_object{n}(3,1)+sz_block(3)-1);
out(n) = sum(block_im(:)); % Calculate the sum of all elements in the block
end
elapsedTime = toc;
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime);
Using_spmd
% Check if Parallel Computing Toolbox is available
if ~license('test', 'Distrib_Computing_Toolbox')
error('Parallel Computing Toolbox is required for distributed arrays.');
end
dim1 = 512;
dim2 = 512;
dim3 = 300;
% Create a random distributed array
pad_im = distributed.rand(dim1, dim2, dim3);
sz_block = [3, 3, 3];
num_blocks = 10;
out = zeros(1, num_blocks);
% Some global sample indexes
index_object = cell(1, num_blocks);
for i = 1:num_blocks
index_object{i} = [randi([1, dim1 - sz_block(1) + 1]);
randi([1, dim2 - sz_block(2) + 1]);
randi([1, dim3 - sz_block(3) + 1])];
end
tic;
% Use spmd to distribute the work
spmd
local_pad_im = getLocalPart(pad_im); % Get the local part of the distributed array
local_size = size(local_pad_im); % Size of the local part
local_out = zeros(1, num_blocks); % Local output for each worker
% Determine the global start indices for this worker's local part
global_start_idx1 = globalIndices(pad_im, 1);
global_start_idx2 = globalIndices(pad_im, 2);
global_start_idx3 = globalIndices(pad_im, 3);
for n = 1:num_blocks
% Extract global indices for the block
global_idx1 = index_object{n}(1);
global_idx2 = index_object{n}(2);
global_idx3 = index_object{n}(3);
% Convert global indices to local indices if they belong to the local part
local_idx1 = global_idx1 - global_start_idx1(1) + 1;
local_idx2 = global_idx2 - global_start_idx2(1) + 1;
local_idx3 = global_idx3 - global_start_idx3(1) + 1;
% Ensure indices are within local bounds
if local_idx1 > 0 && local_idx1 + sz_block(1) - 1 <= local_size(1) && ...
local_idx2 > 0 && local_idx2 + sz_block(2) - 1 <= local_size(2) && ...
local_idx3 > 0 && local_idx3 + sz_block(3) - 1 <= local_size(3)
% Extract block from local_pad_im using the local indices
block_im = local_pad_im(local_idx1:local_idx1+sz_block(1)-1, ...
local_idx2:local_idx2+sz_block(2)-1, ...
local_idx3:local_idx3+sz_block(3)-1);
local_out(n) = sum(block_im(:));
end
end
out = spmdPlus(local_out);
end
elapsedTime = toc;
fprintf('Elapsed time for block processing: %.2f seconds\n', elapsedTime);
Kindly refer to the following documentations to get more information on spmd and ‘Distributed Arrays’ respectively:

카테고리

Help CenterFile Exchange에서 Startup and Shutdown에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by