How to handle Complex input in MEX gateway function in CUDA?

Question

Moein Mozaffarzadeh 2021년 6월 18일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/859510-how-to-handle-complex-input-in-mex-gateway-function-in-cuda

댓글: Edric Ellis 2021년 6월 21일

Hi,

I'm trying to write a MEX gateway function to add two complex vectors. I use thrust to do the processing. However, when i want to compile the code in Matlab, i get this error :

Error using mex
C:/Users/moein.m/Documents/C++/ImageReconstruction_VisualStudioCode/Project7_TUI_CUDA3/TUI_CUDA/TUI_CUDA/test2_GPUArray_Complex.cu(55):
error: identifier "mxGetComplexDoubles" is undefined
1 error detected in the compilation of
"C:/Users/moein.m/Documents/C++/ImageReconstruction_VisualStudioCode/Project7_TUI_CUDA3/TUI_CUDA/TUI_CUDA/test2_GPUArray_Complex.cu".
nvcc warning : The 'compute_35', 'compute_37', 'compute_50', 'sm_35', 'sm_37' and 'sm_50' architectures
are deprecated, and may be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress
warning).
test2_GPUArray_Complex.cu
Error in mexcuda (line 168)
    [varargout{1:nargout}] = mex(mexArguments{:});
Error in test2_GPUArray_matlabRunner (line 4)
mexcuda('-v', 'test2_GPUArray_Complex.cu' , 'NVCCFLAGS=-gencode=arch=compute_50,code=sm_50  -Xptxas
-dlcm=cg');

Here is my code:

#include <cuda_runtime.h>
#include "device_launch_parameters.h"
#include <stdio.h>
#include "cuda.h"
#include <iostream>
#include <mex.h>
#include "gpu/mxGPUArray.h"
//#include <cuComplex.h>
//#include <cublas_v2.h>
#include <thrust/complex.h>
 #include "matrix.h"
#define gpuErrchk(ans) { gpuAssert((ans), __FILE__, __LINE__); }
inline void gpuAssert(cudaError_t code, const char* file, int line, bool abort = true)
{
    if (code != cudaSuccess)
    {
        fprintf(stderr, "GPUassert: %s %s %d\n", cudaGetErrorString(code), file, line);
        if (abort) exit(code);
    }
}
typedef thrust::complex<float> fcomp;
__global__ void add(fcomp * Device_DataRes, fcomp * Device_Data1, fcomp * Device_Data2, int N) {
    int TID = threadIdx.y * blockDim.x + threadIdx.x;
    int BlockOFFset = blockDim.x * blockDim.y * blockIdx.x;
    int GID_RowBased = BlockOFFset + TID;
    if (GID_RowBased < N) {
        Device_DataRes[GID_RowBased] = Device_Data1[GID_RowBased] + Device_Data2[GID_RowBased];
    }
}
void mexFunction(int nlhs, mxArray* plhs[],
    int nrhs, const mxArray* prhs[]) {
    int N = 1000;
    int ArrayByteSize = sizeof(fcomp) * N;
    fcomp* Data1;
    fcomp* Device_Data1;
    fcomp* Data2;
    fcomp* Device_Data2;
    fcomp* DataRes;
    fcomp* Device_DataRes;
    Data1 = static_cast<fcomp*>(mxGetComplexDoubles(prhs[0]));
    //Data1 = (fcomp*)(mxGetComplexDoubles(prhs[0]));
    gpuErrchk(cudaMalloc((void**)&Device_Data1, ArrayByteSize));
    gpuErrchk(cudaMemcpy(Device_Data1, Data1, ArrayByteSize, cudaMemcpyHostToDevice));
    Data2 = static_cast<fcomp*>(mxGetComplexDoubles(prhs[1]));
    gpuErrchk(cudaMalloc((void**)&Device_Data2, ArrayByteSize));
    gpuErrchk(cudaMemcpy(Device_Data2, Data2, ArrayByteSize, cudaMemcpyHostToDevice));
    plhs[0] = mxCreateNumericMatrix(N, 1, mxSINGLE_CLASS, mxCOMPLEX);
    DataRes = static_cast<fcomp*> (mxGetData(plhs[0]));
    gpuErrchk(cudaMalloc((void**)&Device_DataRes, ArrayByteSize));
    dim3 block(1024);
    int GridX = (N / block.x + 1);
    dim3 grid(GridX);//SystemSetup.NumberOfTransmitter
    add << <grid, block >> > (Device_DataRes, Device_Data1, Device_Data2,  N);
    gpuErrchk(cudaMemcpy(DataRes, Device_DataRes, ArrayByteSize, cudaMemcpyDeviceToHost));
    cudaFree(Device_Data1);
    cudaFree(Device_Data2);
    cudaFree(Device_DataRes);
}

Could you please let me know what is wrong here? It seems that "mxGetComplexDoubles" is making the problem.

Regards,

Moein.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Edric Ellis 2021년 6월 18일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/859510-how-to-handle-complex-input-in-mex-gateway-function-in-cuda#answer_727720

MATLAB Online에서 열기

Firstly, as per the doc page for interleaved complex data, you need to add the command-line flag -R2018a to use mxGetComplexDoubles. But actually I think given that you are casting to thrust::complex<float>, you actually should use mxGetComplexSingles.

You can't directly cast between the mxComplexSingle* returned by mxGetComplexSingles and thrust::complex<float>, but it should still work to perform the memcpy operations. In other words, you need to do this:

    mxComplexSingle * Data1 = mxGetComplexSingles(prhs[0]);
    thrust::complex<float> * Device_Data1;
    gpuErrchk(cudaMalloc((void**)&Device_Data1, ArrayByteSize));
    gpuErrchk(cudaMemcpy(Device_Data1, Data1, ArrayByteSize, cudaMemcpyHostToDevice));

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Moein Mozaffarzadeh 2021년 6월 18일

MATLAB Online에서 열기

test2_GPUArray_Complex.zip

Edric,

I was wondering to ask another question. So, let's assume that in my kernel, i have this:

	Device_Data1[GID_RowBased] = Device_Data1[GID_RowBased] + Device_Data2[GID_RowBased];
     

and the output of the mex function is copied to host using:

(cudaMemcpy(DataRes, Device_Data1, ArrayByteSize, cudaMemcpyDeviceToHost));

attached is the modified code.

This code compiles and works fine, but doing the summation as above may lead to conflict when two threads want to load data to the same address (of course not in this code). this is why atomic functions should be used. However, if i use atomic add, the code does not compile. Is there any solution for this?

Moein.

Edric Ellis 2021년 6월 21일

It doesn't look like atomicAdd is what you want here - that's for scalar values I think. In this case, I'd simply return the gpuArray data back to MATLAB and use the builtin overload of + for gpuArray.

댓글을 달려면 로그인하십시오.

How to handle Complex input in MEX gateway function in CUDA?

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

How to handle Complex input in MEX gateway function in CUDA?

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

제품

릴리스

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기