성능

코드 생성 문제 해결, 코드 실행 시간 개선, 생성 코드의 메모리 사용량 줄이기

GPU Coder™에서 생성된 코드가 예상대로 작동하지 않는 가장 일반적인 이유 몇 가지는 다음과 같습니다.

CUDA^® 커널이 생성되지 않습니다.
호스트에서 장치 메모리로의 전송 또는 장치에서 호스트 메모리로의 전송(cudaMemcpy)이 성능 저하를 일으킵니다.
병렬 처리가 충분하지 못하거나 장치 문제가 있습니다.

다음 항목에서는 이러한 증상의 일반적인 원인을 자세히 설명하고 내장 스크리너 함수를 활용하여 이러한 문제를 감지하는 방법을 설명합니다. 이러한 문제를 해결하고 보다 효율적인 CUDA 코드를 생성하는 방법에 대한 정보를 확인할 수 있습니다.

앱

GPU Coder	MATLAB 코드에서 CUDA 코드 생성
GPU 환경 검사	GPU 코드 생성 환경에 대한 확인과 설정

툴

GPU 성능 분석기

Analyze GPU profiling data and identify optimizations (R2023a 이후)

함수

모두 확장

코드 생성

`coder.gpuConfig`	Create GPU code generation configuration
`codegen`	MATLAB 코드에서 C/C++ 코드 생성
`gpucoder`	GPU Coder 앱 열기
`gpuPerformanceAnalyzer`	Analyze and optimize performance of the generated code (R2023a 이후)
`gpuprofile`	Profile execution time for generated CUDA code (R2024a 이후)

GPU 커널 프라그마

`coder.gpu.kernel`	Pragma that maps `for`-loops to GPU kernels
`coder.gpu.kernelfun`	함수를 GPU 커널에 매핑하는 프라그마
`coder.gpu.nokernel`	Pragma to disable kernel creation for loops

객체

모두 확장

코드 구성

`coder.GpuCodeConfig`	Configuration parameters for CUDA code generation from MATLAB code
`coder.MexCodeConfig`	Configuration parameters for MEX function generation from MATLAB code
`coder.CodeConfig`	MATLAB 코드에서 C/C++ 코드를 생성하기 위한 구성 파라미터
`coder.EmbeddedCodeConfig`	Configuration parameters for C/C++ code generation from MATLAB code with Embedded Coder
`coder.gpuEnvConfig`	Configuration object for checking the GPU code generation environment

도움말 항목

Code Generation Reports
Create and view reports generated during code generation.
Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB^® code that runs on the GPU.
Generating a GPU Code Metrics Report for Code Generated from MATLAB Code
Create and explore GPU static code metrics report.
Analyzing Network Performance Using the Deep Learning Dashboard
Investigate the performance of deep learning networks and layers in generated code using the Deep Learning Dashboard.
Kernel Analysis
Recommendations for generating efficient CUDA kernels.
Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
Register Count nvlink Error
Troubleshoot compilation failures due to a register count nvlink error.
Improve Performance of GPU Code by Removing Loop Dependencies
Remove loop dependencies to generate GPU kernels for for- loops. (R2026a 이후)
Identify Function Calls That Prevent Kernel Creation
Identify code that prevents GPU Coder from generating a CUDA kernel for a loop. (R2026a 이후)
Optimize Kernels That Contain Loops
Rewrite loops in MATLAB to avoid generated code kernels that contain loops.
Prevent Kernel Launches Inside Loops
Parallelize loops that launch kernels to execute them on the GPU.
Minimize Memory Copy Events in Generated Code Loops
Rewrite loops to minimize the number of data transfers between the CPU and GPU in generated CUDA code.

추천 예제

Pass GPU Inputs to Entry-Point Functions

Generate code that receives data from the GPU to avoid unnecessary memory copies.

라이브 스크립트 열기

Profile Generated CUDA MEX Functions Using Performance Analyzer

Visualize code metrics and identify optimization and tuning opportunities in generated CUDA MEX.

라이브 스크립트 열기

Profile and Optimize Generated GPU Code

Profile and optimize generated GPU code using the GPU Performance Analyzer. You can use the GPU Performance Analyzer to generate code, profile the code, and detect performance bottlenecks. Use the performance diagnostics from the analyzer to modify the original MATLAB® function and improve performance of generated CUDA® code.

라이브 스크립트 열기

GPU Profiling on NVIDIA Jetson Platforms

Analyze and optimize the performance of the generated CUDA code on the Jetson™ platform.

라이브 스크립트 열기

Analyze Performance of Code Generated for Deep Learning Networks

Analyze the performance of the generated CUDA code for deep learning networks.

라이브 스크립트 열기