CUDA missing library libcuda.so.1

Hi, im trying to use GPU but when i try to run some command a get this error:
Error using gpuDevice (line 26)
There is a problem with the CUDA driver associated with this GPU device. See
www.mathworks.com/gpudriver to find and install the latest supported driver.
Caused by:
The CUDA driver could not be loaded. The library name used was: libcuda.so.1. The error was:
libcuda.so.1: cannot open shared object file: No such file or directory.
My Matlab client is on server called "mdcstest" and the GPU on another called "comp01", im using Matlab Job Scheduler. Normal task, matlabpool is running without problem. On "comp01" i have cuda 5.0.
What i need to do, to use the GPU on "comp01" ?
Thanks.

댓글 수: 8

Thomas Ibbotson
Thomas Ibbotson 2012년 11월 20일
Can you show the code you were trying to run when you got this error? Specifically what commands were you using on 'mdcstest' to run the code on 'comp01', did you use 'batch' or a script with 'spmd' or did you create your own jobs and tasks?
When i try gpuDevice, or try tu run:
size = 1000;
tic
xm = gpuArray(rand(size));
Xm = xm * xm ;
R = gather(Xm);
result = toc
I tried to run the commands in command line, and also a tried to run it in batch and got the same error.
Thomas Ibbotson
Thomas Ibbotson 2012년 11월 21일
If you are running that code from the command line on "mdcstest" then it will run on "mdcstest" not "comp01". When you ran 'batch', what profile were you using? You need to make sure that your profile for MJS is set as the default, otherwise the code will run using the local scheduler, and will again fail as it will be running on "mdcstest" not "comp01".
Ivan
Ivan 2012년 11월 21일
Profile for MJS is set as default, im sure with this.
Ok, just to double check that the code is running on the correct machine can you run:
mjs = parcluster
mjs.matlabpool
spmd, gpuDevice, end
matlabpool close
and paste the results here?
I got this:
mjs =
MJS Cluster Information
=======================
Profile: MJS
Modified: false
Host: comp01
NumWorkers: 4
JobStorageLocation: Database on comp01
ClusterMatlabRoot: /shared/software/MATLAB
OperatingSystem: unix
- Assigned Jobs
Number Pending: 0
Number Queued: 0
Number Running: 0
Number Finished: 0
- MJS Specific Properties
Name: test
AllHostAddresses: 147.232.116.7
fe80:0:0:0:5ef3:fcff:fea9:18d4%3
fe80:0:0:0:200:c9ff:fecd:a66c%5
NumBusyWorkers: 0
NumIdleWorkers: 4
Username: durkac
SecurityLevel: 0 (No security)
HasSecureCommunication: false
Starting matlabpool using the 'MJS' profile ... connected to 1 labs.
Lab 1:
ans =
parallel.gpu.CUDADevice handle
Package: parallel.gpu
Properties:
Name: 'Tesla M2070'
Index: 1
ComputeCapability: '2.0'
SupportsDouble: 1
DriverVersion: 5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [65535 65535]
SIMDWidth: 32
TotalMemory: 5.6366e+09
FreeMemory: 5.3198e+09
MultiprocessorCount: 14
ClockRateKHz: 1147000
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1
Sending a stop signal to all the labs ... stopped.
I didnt use smpd..end, when i was trying to use GPU, that was the problem i thing, or?
Ben Tordoff
Ben Tordoff 2012년 12월 24일
If you didn't put the gpuDevice call inside SPMD then it will run on your client machine (which presumably does not have a GPU or the CUDA drivers). Putting it inside SPMD causes it to run on the worker(s). Equally, trying to use the GPU inside a PARFOR or from within a task function would probably have worked fine as that also happens on the worker.
I believe Thomas's answer below is the correct one.
in my case this error is shown ..nd when i use gpudevice count it is showing 0.please solve my problem > mjs = parcluster mjs.matlabpool spmd, gpuDevice, end matlabpool close
mjs =
Local Cluster Information
=========================
Profile: local
Modified: false
Host: robo-PC
NumWorkers: 2
JobStorageLocation: C:\Users\robo\AppData\Roaming\MathWorks\MATLAB\local_cluster_jobs\R2012b
ClusterMatlabRoot: C:\Program Files\MATLAB\R2012b
OperatingSystem: windows
RequiresMathWorksHostedLicensing: false
- Assigned Jobs
Number Pending: 0
Number Queued: 1
Number Running: 1
Number Finished: 0
Error using parallel.Cluster/matlabpool (line 64) Failed to open matlabpool. (For information in addition to the causing error, validate the profile 'local' in the Cluster Profile Manager.)
Caused by: Error using distcomp.interactiveclient/start (line 11) Found an interactive session. You cannot have multiple interactive sessions open simultaneously. To terminate the existing session, use 'matlabpool close'.
>>

댓글을 달려면 로그인하십시오.

답변 (2개)

Jason Ross
Jason Ross 2012년 11월 19일
편집: Jason Ross 2012년 11월 19일

1 개 추천

It sounds like the GPU driver is not installed correctly on comp01. Perhaps you installed the SDK and toolkit and not the driver?
nVidia ships a utility called "nvidia-smi" (by default, in /usr/bin) that will list all the installed GPUs in a system. I'm betting you'll get the same error at the command line as you do in MATLAB if you run nvidia-smi. If things are working properly, you should see something like the following:
% nvidia-smi
Mon Nov 19 15:33:41 2012
+------------------------------------------------------+
| NVIDIA-SMI 4.304.54 Driver Version: 304.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Quadro FX 370 | 0000:01:00.0 N/A | N/A |
| 60% 60C N/A N/A / N/A | 4% 11MB / 255MB | N/A Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla C1060 | 0000:08:00.0 Off | N/A |
| 35% 56C P8 N/A / N/A | 0% 3MB / 4095MB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 Not Supported |
+-----------------------------------------------------------------------------+

댓글 수: 3

I tried what you suggested and i got this:
nvidia-smi
Tue Nov 20 07:22:17 2012
+------------------------------------------------------+
| NVIDIA-SMI 4.304.54 Driver Version: 304.54 |
|-------------------------------+----------------------+----------------------+
| GPU Name | Bus-Id Disp. | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla M2070 | 0000:15:00.0 Off | 0 |
| N/A N/A P0 N/A / N/A | 2% 84MB / 5375MB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Compute processes: GPU Memory |
| GPU PID Process name Usage |
|=============================================================================|
| 0 17291 /shared/software/MATLAB/bin/glnxa64/MATLAB 72MB |
+-----------------------------------------------------------------------------+
So the driver is installed properly, or?
As you can see, that nvidia-smi output shows that MATLAB is already accessing the GPU. Perhaps the device is in exclusive mode? (Can I also just confirm that you ran this command on the cluster node "comp01"). If you run
nvidia-smi -q
You should be able to see what compute mode the device is in.
I got this:
==============NVSMI LOG==============
Timestamp : Thu Nov 22 13:47:21 2012
Driver Version : 304.54
Attached GPUs : 1
GPU 0000:15:00.0
Product Name : Tesla M2070
Display Mode : Disabled
Persistence Mode : Disabled
Driver Model
Current : N/A
Pending : N/A
Serial Number : 0323111075641
GPU UUID : GPU-e4f9d738-18d5-866c-f4f0-268427c2d877
VBIOS Version : 70.00.3E.00.03
Inforom Version
Image Version : N/A
OEM Object : 1.0
ECC Object : 1.0
Power Management Object : 1.0
GPU Operation Mode
Current : N/A
Pending : N/A
PCI
Bus : 0x15
Device : 0x00
Domain : 0x0000
Device Id : 0x06D210DE
Bus Id : 0000:15:00.0
Sub System Id : 0x083010DE
GPU Link Info
PCIe Generation
Max : 2
Current : 2
Link Width
Max : 16x
Current : 16x
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons : N/A
Memory Usage
Total : 5375 MB
Used : 3007 MB
Free : 2368 MB
Compute Mode : Default
Utilization
Gpu : 35 %
Memory : 10 %
Ecc Mode
Current : Enabled
Pending : Enabled
ECC Errors
Volatile
Single Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Total : 0
Double Bit
Device Memory : 0
Register File : 0
L1 Cache : 0
L2 Cache : 0
Texture Memory : N/A
Total : 0
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : 0
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Total : 0
Temperature
Gpu : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 573 MHz
SM : 1147 MHz
Memory : 1566 MHz
Compute Processes
Process ID : 10911
Name : /shared/software/MATLAB/bin/glnxa64/MATLAB
Used GPU Memory : 453 MB
Process ID : 11137
Name : /shared/software/MATLAB/bin/glnxa64/MATLAB
Used GPU Memory : 1041 MB
Process ID : 10803
Name : /shared/software/MATLAB/bin/glnxa64/MATLAB
Used GPU Memory : 453 MB
Process ID : 11026
Name : /shared/software/MATLAB/bin/glnxa64/MATLAB
Used GPU Memory : 1041 MB

댓글을 달려면 로그인하십시오.

Thomas Ibbotson
Thomas Ibbotson 2012년 11월 23일

1 개 추천

When you open a matlabpool and use 'spmd', the code in that block is run on all the workers in the pool. As your workers are running on 'comp01', which has the GPU, this means that the GPU code will be able to run. Without the 'spmd' any code you run will run on your local machine (in your case this did not have the GPU driver and a supported GPU and it failed.)
Note that spmd is not the only way to run code on the cluster, you can also use the 'batch' function. In this case you give 'batch' the name of a script you want to run, and that will run on one of the workers on the cluster. For example:
mjs = parcluster;
job = mjs.batch('myScript');
wait(job);
load(job);
Where 'myScript' has the code you want to run on the GPU on 'comp01'.
For more information about 'batch' see the batch processing documentation, and for 'spmd' see the spmd documentation.

질문:

2012년 11월 19일

댓글:

2014년 5월 14일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by