Overcoming VRAM limitations on Nvidia A100
조회 수: 27 (최근 30일)
I have access to a cluster with several Nvidia A100 40GB GPU's. I am training a deep learning network on these GPU's, however using trainNetwork() only makes use of around 10GB of the GPU's vRAM. I beleive this is a limitation of Nvidia Cuda, see here.
I have two related questions;
- Other cluster users are writting in python with the 'DistributedDataParallel' module in PyTorch and are able to load in 40Gb of data (over the cuda limitation) onto the GPU's; is there a similar work around for MATLAB?
- If this isn't the case is there any way to use Multi-instance GPU's, so essentially split the physical card into several smaller virtual GPU's and compute in parrellel?
Ideally I would like to speed up computation, so having a 3/4 of the vRAM empty which could otherwise be used for mini-batches is a little heart breaking.