mexcuda compiler error: "__global__" does not apply here

조회 수: 10 (최근 30일)
Nathan
Nathan 2024년 6월 17일
댓글: Nathan 2024년 6월 24일
I'm writing a MEX CUDA function that performs the distance formula as: for each pixel of an image, where the pixel locations are in x and y respectively, and and contain the locations for an array of transducers. The function details aren't important for now, because I cannot compile when this kernel is in the code:
__global__ void distance_formula_index_units(uint32_t * delay, const double * x, const double * z, const double * x0, const double * z0, params * p) {
int x_px = threadIdx.x;
int y_px = threadIdx.y;
int sen = blockIdx.x;
int m_id = blockDim.x * blockDim.y + sen;
double x_dist = x[x_px] - x0[sen];
double y_dist = z[y_px] - z0[sen];
double distance = sqrt(x_dist*x_dist + y_dist*y_dist);
delay[m_id] = (uint32_t)(distance / p->c / p->dt);
}
Other explanations for context: delay is the output matrix of time delays, p is a typedef'd struct from earlier in the code that contains parameters about the image and system such as the wavespeed c and sample time resolution dt.
typedef struct params {
size_t n_xp;
size_t n_yp;
size_t n_sens;
double dt;
double c;
} params;
I run "mexcuda calculate_delays.cu", and instead of compiling, I get:
Error using mex
.\06_MEX_Functions\calculate_delays.cu(14): error: attribute "__global__" does not apply here
__declspec(__global__) void distance_formula_distance_units(uint32_t * delay, const double * xx, ...
^
I have only included this one function from the code, because it successfully compiles when I comment out this specific kernel, and the logic gate that activates it. There's another kernel in the code as well (distance_formula_time_units). The code that calls this function ALSO throws a different error:
if (*use_index_units) {
mxGPUArray * delaymatrix = mxGPUCreateGPUArray(3, output_dimensions, mxUINT32_CLASS, mxREAL, MX_GPU_DO_NOT_INITIALIZE);
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
distance_formula_index_units
<<<input_parameters->n_xp*input_parameters->n_yp, input_parameters->n_sens>>>
(delay_dvc_int, x_arr_dvc, y_arr_dvc, x0_dvc, z0_dvc, input_parameters);
cudaDeviceSynchronize();
// copy delay matrix from device back to host, set output
outputs[0] = mxGPUCreateMxArrayOnCPU(delaymatrix);
mxGPUDestroyGPUArray(delaymatrix);
} else { ...(call the other kernel)
Commenting out the destance formula index units kernel gives a compiler error on this block:
.\06_MEX_Functions\calculate_delays.cu(101): error: identifier "uint32_t" is undefined
uint32_t * delay_dvc_int = (uint32_t*) mxGPUGetData(delaymatrix);
^
This is wild, because uint32_t is completely defined in line 24 of stdint.h, and this is in the include tree of mex.h.
As far as I can tell, my code is completely valid, and intellisense in VSCode thinks so, too. ChatGPT doesn't find any code errors, either. What is going on here?

채택된 답변

Joss Knight
Joss Knight 2024년 6월 24일
편집: Joss Knight 2024년 6월 24일
You cannot call a function declared __global__ from another function declared __global__. Declare the second function as __device__ and that should work.
Your device functions and kernels need to be declared using supported CUDA device types, whereas you are using host-side types defined by the MEX headers. Try uint32_T instead (i.e. capital T).
  댓글 수: 1
Nathan
Nathan 2024년 6월 24일
Excellent, great catch on the capital vs. lowercase T in uint32_T, that fixed the compiler issues completely! (Did not need to declare the second function as __device__)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 MATLAB Compiler에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by