Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

조회 수: 3 (최근 30일)

Keonwook Kim 2025년 3월 11일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/2175020-issue-with-stepnorm-going-to-zero-on-rtx-4060-ti-during-mlp-training

I am using the "trainnet" function to train a relatively shallow MLP network. To accelerate processing, I use two different GPUs: an RTX 2070 Super and an RTX 4060 Ti.

The RTX 2070 Super produces the expected output for all iterations. However, the RTX 4060 Ti quickly terminates the training process because the StepNorm value approaches zero. I suspect this issue is related to the memory bandwidth difference—256-bit for the RTX 2070 Super versus 128-bit for the RTX 4060 Ti—which might affect numerical precision during parallel computations.

When I checked the SingleDoubleRatio, I found that:

RTX 2070 Super: 32
RTX 4060 Ti: 64

According to the MATLAB documentation, a SingleDoubleRatio of 32 indicates more double-precision computation, while 64 indicates less. I attempted to manually enforce precision control for the GPU but was unsuccessful.

How can I resolve this issue and ensure stable training on the RTX 4060 Ti? Any insights would be greatly appreciated.

Thank you!

Help Center 및 File Exchange에서 Pattern Recognition and Classification에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Issue with StepNorm Going to Zero on RTX 4060 Ti During MLP Training

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

답변 (0개)

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기