주요 콘텐츠

Accelerate Correlation with GPUs

This example shows how to use a GPU to accelerate cross-correlation. Many correlation problems involve large data sets and can be solved much faster using a GPU. This example requires a Parallel Computing Toolbox™ user license. Refer to GPU Computing Requirements (Parallel Computing Toolbox) to see what GPUs are supported.

Introduction

Start by learning some basic information about the GPU in your machine. To access the GPU, use the Parallel Computing Toolbox.

if ~(parallel.gpu.GPUDevice.isAvailable)
    fprintf("\n\t**GPU not available. Stopping.**\n");
    return;
else
    dev = gpuDevice;
    fprintf(...
    "GPU detected (%s, %d multiprocessors, Compute Capability %s)", ...
    dev.Name, dev.MultiprocessorCount, dev.ComputeCapability);
end
GPU detected (NVIDIA RTX A5000, 64 multiprocessors, Compute Capability 8.6)

Benchmarking Functions

Because code written for the CPU can be ported to run on the GPU, a single function can be used to benchmark both the CPU and GPU. However, because code on the GPU executes asynchronously from the CPU, special precaution should be taken when measuring performance. Before measuring the time taken to execute a function, ensure that all GPU processing has finished by executing the 'wait' method on the device. This extra call will have no effect on the CPU performance.

This example benchmarks three different types of cross-correlation.

Benchmark Simple Cross-Correlation

For the first case, two vectors of equal size are cross-correlated using the syntax xcorr(u,v). The ratio of CPU execution time to GPU execution time is plotted against the size of the vectors.

sizes = floor(logspace(5,7,10))';
tc = zeros(numel(sizes),1);
tg = zeros(numel(sizes),1);

for s = 1:numel(sizes)
    if s == 1
        fprintf("** Benchmarking vector-vector cross-correlation. **")
        fprintf("Running xcorr of N elements")
    end
    fprintf(".");
    a = single(rand(sizes(s),1));
    b = single(rand(sizes(s),1));
    tc(s) = timeit(@() xcorr(a,b),1);
    tg(s) = gputimeit(@() xcorr(gpuArray(a),gpuArray(b)),1);
    if s == numel(sizes)
        disp(table(sizes,1e3*tc,1e3*tg,VariableNames= ...
            ["Length (N)" "CPU time (ms)" "GPU time (ms)"]))
    end
end
** Benchmarking vector-vector cross-correlation. **
Running xcorr of N elements
..........
    Length (N)    CPU time (ms)    GPU time (ms)
    __________    _____________    _____________

         1e+05       3.2248           1.1345    
    1.6681e+05       6.0368           1.2935    
    2.7826e+05       9.0778           1.5205    
    4.6416e+05        16.32           1.9185    
    7.7426e+05       25.602           2.5235    
    1.2915e+06       56.349           3.7115    
    2.1544e+06       85.022           7.5365    
    3.5938e+06        143.4           13.051    
    5.9948e+06       285.56           20.219    
         1e+07       365.81           33.806    
% Plot the results
fig = figure;
ax = axes(Parent=fig);
semilogx(ax,sizes,tc./tg,"r*-");
ylabel(ax,"Speedup");
xlabel(ax,"Vector size");
title(ax,"GPU Acceleration of XCORR");

Benchmarking Matrix Column Cross-Correlation

For the second case, the columns of a matrix A are pairwise cross-correlated to produce a large matrix output of all correlations using the syntax xcorr(A). The ratio of CPU execution time to GPU execution time is plotted against the size of the matrix A.

sizes = floor(logspace(2,2.5,10))';
tc = zeros(numel(sizes),1);
tg = zeros(numel(sizes),1);
for s = 1:numel(sizes)
    if s == 1
        fprintf("** Benchmarking matrix column cross-correlation. **")
        fprintf("Running xcorr (matrix) of an N-by-N matrix")
    end
    fprintf(".");
    a = single(rand(sizes(s)));
    tc(s) = timeit(@() xcorr(a),1);
    tg(s) = gputimeit(@() xcorr(gpuArray(a)),1);
    if s == numel(sizes)
        disp(table(sizes,1e3*tc,1e3*tg,VariableNames= ...
            ["Size (N)" "CPU time (ms)" "GPU time (ms)"]))
    end
end
** Benchmarking matrix column cross-correlation. **
Running xcorr (matrix) of an N-by-N matrix
..........
    Size (N)    CPU time (ms)    GPU time (ms)
    ________    _____________    _____________

      100          5.3818           0.80717   
      113          8.0898           0.88967   
      129          13.426            1.6885   
      146          18.493            2.0345   
      166            25.6            2.4435   
      189          36.405            3.1765   
      215          57.449            3.8975   
      244          81.755            4.7185   
      278          131.18            11.829   
      316          178.32            15.023   
% Plot the results
fig = figure;
ax = axes(Parent=fig);
semilogx(ax,sizes.^2,tc./tg,"r*-")
ylabel(ax,"Speedup")
xlabel(ax,"Matrix Elements")
title(ax,"GPU Acceleration of XCORR (Matrix)")

Benchmarking Two-Dimensional Cross-Correlation

For the final case, two matrices, X and Y, are cross correlated using xcorr2(X,Y). X is fixed in size while Y is allowed to vary. The speedup is plotted against the size of the second matrix.

sizes = floor(logspace(2.5,3.5,10))';
tc = zeros(numel(sizes),1);
tg = zeros(numel(sizes),1);
a = single(rand(100));

for s = 1:numel(sizes)
    if s == 1
        fprintf("** Benchmarking 2-D cross-correlation**")
        fprintf("Running xcorr2 of 100-by-100 and N-by-N matrices.")
    end
    fprintf(".");
    b = single(rand(sizes(s)));
    tc(s) = timeit(@() xcorr2(a,b),1);
    tg(s) = gputimeit(@() xcorr2(gpuArray(a),gpuArray(b)),1);
    if s == numel(sizes)
        disp(table(sizes,1e3*tc,1e3*tg,VariableNames= ...
            ["Size (N)" "CPU time (ms)" "GPU time (ms)"]))
    end
end
** Benchmarking 2-D cross-correlation**
Running xcorr2 of 100-by-100 and N-by-N matrices.
..........
    Size (N)    CPU time (ms)    GPU time (ms)
    ________    _____________    _____________

       316         3.9288           2.7275    
       408         6.0308           2.9065    
       527         9.4288           3.4885    
       681         15.166           5.8805    
       879         24.238           7.5805    
      1136         37.586           12.254    
      1467         65.472           18.775    
      1895         109.37           28.703    
      2448         171.28            42.79    
      3162          299.2           70.185    
% Plot the results
fig = figure;
ax =axes(Parent=fig);
semilogx(ax,sizes.^2,tc./tg,"r*-")
ylabel(ax,"Speedup")
xlabel(ax,"Matrix Elements")
title(ax,"GPU Acceleration of XCORR2")

Other GPU Accelerated Signal Processing Functions

There are several other signal processing functions that can be run on the GPU. These functions include fft, ifft, conv, filter, fftfilt, and more. In some cases, you can achieve large acceleration relative to the CPU. For a full list of GPU accelerated signal processing functions, see the GPU Algorithm Acceleration section in the Signal Processing Toolbox™ documentation.

See Also

(Parallel Computing Toolbox) | (Parallel Computing Toolbox) |