cudaconv - Performs 2d convolution using an NVIDIA graphics chipset.
For large datasets (~1 million elements) and especially for large kernels (performance does not scale much with kernel size) cudaconv can outperform conv2 by as much as 5000%.
I did not create this algorithm.. it is adapted from an example included in the CUDA SDK and wrapped in MATLAB-compatible C code.
With very large data matrices, it can *completely* crash your computer(/graphics driver?), so beware. In testing, I found an upper limit on convolution size (limited either by the size the CUDA FFT function can accept or the size of a 2D texture) of roughly 2^20 elements, so above that the code breaks the convolution into smaller pieces. If you are feeling adventurous, feel free to raise that limit, but be aware that at those sizes cudaconv is already roughly 50-100x faster than conv2.
Alexander Huth (2021). Fast 2D GPU-based convolution (https://www.mathworks.com/matlabcentral/fileexchange/20220-fast-2d-gpu-based-convolution), MATLAB Central File Exchange. Retrieved .
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!Create scripts with code, output, and formatted text in a single executable document.
anybody successfully compiled and ran the code under windows, with correct results?
Try adding '-m 64' to the nvcc compile line.
I had similar issues on MacOS (10.6.7+) because 'uname -a' returns i386 but gcc builds for x86_64 by default. nvcc tries to 'autodetect' but gets the wrong value.
I hope this helps.
I get the same result as Dung Chu when I use the .mexmaci file which is included with the download.
I believe that you are supposed to delete that file, and create a new one using make. (Go to that directory in terminal, type 'make')
However, when I do this I am getting architecture issues that I do not know how to deal with:
When compiling, I get the errors like this:
warning: in cudaconv.o, file was built for i386 which is not the architecture being linked (x86_64)
When using the resulting file I get this:
c = cudaconv(2,2)
??? Invalid MEX-file '/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci': dlopen(/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci, 1): no suitable image found. Did find:
/Applications/MATLAB74/work/cudaconv/cudaconv/cudaconv.mexmaci: mach-o, but wrong architecture.
It works. But the result is somehow weird. I run this
y = ones(5);
f = 1/5 * ones(3);
z = cudaconv(y, f)
z2 = conv2(y, f, 'same')
z =
1.0e-35 *
-0.1319 0.0000 -0.1319 0.0000 -0.1319
0.0000 0 0.0000 0 0
0 0 0.0000 0 -0.1320
-0.1319 0.0000 -0.1319 0.0000 -0.1941
0 0 0 0 0.0000
z2 =
0.8000 1.2000 1.2000 1.2000 0.8000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
1.2000 1.8000 1.8000 1.8000 1.2000
0.8000 1.2000 1.2000 1.2000 0.8000
I'm using Fedora 10 with matlab2008. Does any one have any idea why?
Dear Alex, I compile this example as follow..
first of all, i insert a some code at cudaconv.cu
#pragma comment(lib,"C:\\CUDA\\lib64\\cufft.lib")
#pragma comment(lib,"C:\\CUDA\\lib64\\cudart.lib")
next, make a object file
>> system('c:\cuda\bin64\nvcc --compile "d:\cudaconv\cudaconv\cudaconv.cu" -ccbin "C:\Dev\msvs\VC\bin" -o cudaconv.o -IC:\Dev\MATLAB\R2009b\extern\include -IC:\Dev\Msvs\VC\include')
Finaly, compile & link it
>> mex ('cudaconv.o')
good luck to you & sorry to my poor english..
Docu clearly states not windows supported. Trying to alter mex files to have this work. Has anyone had any luck getting this to work under windowze?
I have not ventured outside of matlab yet. How to I compile this code so I can run it?
-D
finally functions that use the GPU!
The convolution is very fast and pretty accurate for the 'valid' part of an 2D signal (except the known double-single precision difference), but there are big differences near the edges if using 'same' shape. Therefore I wrote a piece of shaping code to treat it like conv2. Please test and report any coding mistakes!!!
____________________________________________________
function [newimage] = cudaconv2(image,filter,shape)
if nargin == 2
shape = 'full';
end
if (strcmp(shape, 'full')) % it's not a real 'full' convolution !!!!!
[im in] = size(image);
[fm fn] = size(filter);
outM1 = 1;
outN1 = 1;
image2 = zeros(im+fm-1, in+fn-1);
image2(round(fm/2):round(im + fm/2 - ...1),round(fn/2):round(in + fn/2 - 1)) = image(1:end,1:end);
output = cudaconv(image2,filter);
[outM2, outN2] = size(output);
elseif (strcmp(shape, 'same')) % large differences on the edges
output = cudaconv(image,filter);
[Am An] = size(image);
outM1 = 1;
outN1 = 1;
outM2 = Am;
outN2 = An;
elseif (strcmp(shape, 'valid')) % very accurate
output = cudaconv(image,filter);
[Am An] = size(image);
[Cm Rn] = size(filter);
outM1 = round(Cm/2);
outN1 = round(Rn/2);
outM2 = round(Am - Cm/2);
outN2 = round(An - Rn/2);
else
disp('Shape type not valid');
return;
end
newimage = output(outM1:outM2,outN1:outN2);
____________________________________________________
It works as expected on my Geforce 8400 GPU.
To solve the problem with the zeros output (see previous message by Simon Knight), run the NVIDA CUDA toolkit installer again, opt for the customized installation and check 'CUDAKext'. After rebooting, the cudaconv function should run perfectly.
Hi, I am only getting a matrix of zeros when I run this:
>> y = rand(64);
>> f = 1/9*ones(3);
>> z1 = conv2(y,f, 'same');
>> z2 = cudaconv(y,f);
>> any(any(z1))
ans =
1
>> any(any(z2))
ans =
0
I am using R2007a, and have tried on OSX. Is the latest zip file supplied above the one with the corrected header file?
This stuff looks promising, so I'd be very keen to try it.
Thanks!
Sorry there was a missing header file -- all should be fixed when the update is posted.