File Exchange

image thumbnail

Fast 2D GPU-based convolution

version (49 KB) by Alexander Huth
Graphics chip assisted fast 2d convolution


Updated 16 Jul 2008

View License

cudaconv - Performs 2d convolution using an NVIDIA graphics chipset.

For large datasets (~1 million elements) and especially for large kernels (performance does not scale much with kernel size) cudaconv can outperform conv2 by as much as 5000%.

I did not create this algorithm.. it is adapted from an example included in the CUDA SDK and wrapped in MATLAB-compatible C code.

With very large data matrices, it can *completely* crash your computer(/graphics driver?), so beware. In testing, I found an upper limit on convolution size (limited either by the size the CUDA FFT function can accept or the size of a 2D texture) of roughly 2^20 elements, so above that the code breaks the convolution into smaller pieces. If you are feeling adventurous, feel free to raise that limit, but be aware that at those sizes cudaconv is already roughly 50-100x faster than conv2.

Cite As

Alexander Huth (2021). Fast 2D GPU-based convolution (, MATLAB Central File Exchange. Retrieved .

Comments and Ratings (13)


anybody successfully compiled and ran the code under windows, with correct results?

Bogdan Vacaliuc

Try adding '-m 64' to the nvcc compile line.

I had similar issues on MacOS (10.6.7+) because 'uname -a' returns i386 but gcc builds for x86_64 by default. nvcc tries to 'autodetect' but gets the wrong value.

I hope this helps.

Diego Ardila

I get the same result as Dung Chu when I use the .mexmaci file which is included with the download.

I believe that you are supposed to delete that file, and create a new one using make. (Go to that directory in terminal, type 'make')

However, when I do this I am getting architecture issues that I do not know how to deal with:
When compiling, I get the errors like this:

warning: in cudaconv.o, file was built for i386 which is not the architecture being linked (x86_64)

When using the resulting file I get this:
c = cudaconv(2,2)
??? Invalid MEX-file '/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci': dlopen(/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci, 1): no suitable image found. Did find:
/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci: mach-o, but wrong architecture.

Dung Chu

It works. But the result is somehow weird. I run this
y = ones(5);
f = 1/5 * ones(3);
z = cudaconv(y, f)
z2 = conv2(y, f, 'same')

z =

1.0e-35 *

-0.1319 0.0000 -0.1319 0.0000 -0.1319
0.0000 0 0.0000 0 0
0 0 0.0000 0 -0.1320
-0.1319 0.0000 -0.1319 0.0000 -0.1941
0 0 0 0 0.0000

z2 =

0.8000 1.2000 1.2000 1.2000 0.8000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
0.8000 1.2000 1.2000 1.2000 0.8000

I'm using Fedora 10 with matlab2008. Does any one have any idea why?

Oh HongSic

Dear Alex, I compile this example as follow..

first of all, i insert a some code at
#pragma comment(lib,"C:\\CUDA\\lib64\\cufft.lib")
#pragma comment(lib,"C:\\CUDA\\lib64\\cudart.lib")

next, make a object file

>> system('c:\cuda\bin64\nvcc --compile "d:\cudaconv\cudaconv\" -ccbin "C:\Dev\msvs\VC\bin" -o cudaconv.o -IC:\Dev\MATLAB\R2009b\extern\include -IC:\Dev\Msvs\VC\include')

Finaly, compile & link it

>> mex ('cudaconv.o')

good luck to you & sorry to my poor english..


Docu clearly states not windows supported. Trying to alter mex files to have this work. Has anyone had any luck getting this to work under windowze?


I have not ventured outside of matlab yet. How to I compile this code so I can run it?



finally functions that use the GPU!


The convolution is very fast and pretty accurate for the 'valid' part of an 2D signal (except the known double-single precision difference), but there are big differences near the edges if using 'same' shape. Therefore I wrote a piece of shaping code to treat it like conv2. Please test and report any coding mistakes!!!
function [newimage] = cudaconv2(image,filter,shape)
if nargin == 2
shape = 'full';

if (strcmp(shape, 'full')) % it's not a real 'full' convolution !!!!!
[im in] = size(image);
[fm fn] = size(filter);
outM1 = 1;
outN1 = 1;
image2 = zeros(im+fm-1, in+fn-1);
image2(round(fm/2):round(im + fm/2 - ...1),round(fn/2):round(in + fn/2 - 1)) = image(1:end,1:end);
output = cudaconv(image2,filter);
[outM2, outN2] = size(output);

elseif (strcmp(shape, 'same')) % large differences on the edges
output = cudaconv(image,filter);
[Am An] = size(image);
outM1 = 1;
outN1 = 1;
outM2 = Am;
outN2 = An;

elseif (strcmp(shape, 'valid')) % very accurate
output = cudaconv(image,filter);
[Am An] = size(image);
[Cm Rn] = size(filter);
outM1 = round(Cm/2);
outN1 = round(Rn/2);
outM2 = round(Am - Cm/2);
outN2 = round(An - Rn/2);
disp('Shape type not valid');

newimage = output(outM1:outM2,outN1:outN2);

Yi Cao

It works as expected on my Geforce 8400 GPU.

Bjorn Bjorno

To solve the problem with the zeros output (see previous message by Simon Knight), run the NVIDA CUDA toolkit installer again, opt for the customized installation and check 'CUDAKext'. After rebooting, the cudaconv function should run perfectly.

Simon Knight

Hi, I am only getting a matrix of zeros when I run this:
>> y = rand(64);
>> f = 1/9*ones(3);
>> z1 = conv2(y,f, 'same');
>> z2 = cudaconv(y,f);
>> any(any(z1))
ans =
>> any(any(z2))
ans =

I am using R2007a, and have tried on OSX. Is the latest zip file supplied above the one with the corrected header file?
This stuff looks promising, so I'd be very keen to try it.

Alex Huth

Sorry there was a missing header file -- all should be fixed when the update is posted.

MATLAB Release Compatibility
Created with R2007a
Compatible with any release
Platform Compatibility
Windows macOS Linux

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!