Parallelizing MATLAB code using many GPU cores
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
I have a MATLAB script that runs many independent iterations (for loop), of the form
for idx=1:N
result(idx) = some_procedure(data(idx));
end
I have a NVIDIA graphics card with over 3000 CUDA cores. Is it possible to parallelize the code, such that e.g. each GPU core handles one iteration? I understood that parfor is not the answer here but is there some equivalent?
채택된 답변
Joss Knight
2018년 8월 31일
GPU cores do not work like CPU cores. They cannot run independent tasks.
댓글 수: 10
Hello,
I'm sorry to correct you. Not only for GPU but for any device which allow SMP (symmetric multiprocessing), the independence is a necessary condition for doing parallel computing.
So a FOR LOOP of independent jobs may be very well parallelized, the single condition is that your environment must allow you to do that. For example the PARFOR for the CPU parallel execution.
Natively CUDA allows that, the FOR will be distributed among the cores of the GPU.
The question was "how to do that inside Matlab?"... Your answer informs the questioner in the opposite way.
Best regards
This is a fair point since there is ambiguity in the question. GPU threads can process arrays of data when there are no dependencies between threads, as long as the operations they are performing are the same; unlike CPU cores which can do entirely different things on different threads.
I answered the question "how to do that inside Matlab" by directing the OP towards the documentation for gpuArray. Generally that's preferable when the question is as general as this one, since I cannot presume to know exactly what bit of the documentation will answer the question; and the asker should familiarize themselves with the background material before asking clarifying questions.
Hope that is satisfactory.
Why gpuArray with for-loop does not significantly increase the speed compared to parfor-loop? I am trying to code a convolution network with general code without using deep learning toolbox as I have to design some different algorithms to train it. Without deep learning toolbox, it takes me a lot of time to complete one epoch training. Then I was thinking to use gpuArray instead of parfor-loop as I believe it would be much faster. However, when I transfer data to GPU and conduct for-loop, GPU usage shows around 5-10 percent. The speed improvement does not quite significantly.
Any suggestions on this? Many thanks.
I can't say because there's no real explanation here of what you're trying to do inside the loop - what some_procedure is.
The most common mistake for someone new to gpuArray is to assume that MATLAB will parallelize your serial code for you, by putting the body of a for-loop, say, into a kernel that will execute on multiple GPU threads. This is not surprising because that is indeed the way parfor works on CPU cores. But GPUs do not work like that. You need to write highly vectorized code, using techniques such as those documented in MATLAB's documentation. There actually is a way to have MATLAB create a kernel for you, using gpuArray/arrayfun, although this isn't generally necessary.
If data(idx) is, as it appears to be, a scalar, then it does look as though this is your problem and what you need is to read the GPU documentation, learn about vectorization, and try to rework your algorithm so that it no longer contains loops. However, it could be that you know all this, you have vectorized your code, and all that's happening is that you were expecting some_procedure to run faster with gpuArray inputs. In order to diagnose that, we're going to have to see what you're doing in that function.
Thanks Joss. What I am going to do is that I have hundreds of different matrix with a 2-D dimensionality, and I need to conduct thousands of matrix multiplication or matrix inverse. Also, the output of each matrix operation (matrix multiplication or inverse) is a matrix as well. Therefore, I have a super big nested loop and was using parfor.
I also tried arrayfun or cellfun but did not successfully use it. I guess the arrayfun is only allowed to run under the element-wise condition, but not this matrix multiplication or inverse operations.
I guess if the operation could be run in arrayfun, the speed could be significantly improved. Right now, compared to deep learning toolbox function (activiation function), my current code runs around 50 percent slower than that.
Great! Thanks!
I have another problem regarding the speed. I have used pagefun, which gives me amazing speed. However, as I have a super large number of matrix multiplication, I still need to speed up my codes. Then I am thinking to use half or sparse data type, rather than single in order to gain faster speed and save memory. However, pagefun does not support the half or sparse data type. Could you please give me some suggestions? Thanks.
You should look elsewhere for further performance improvements. MATLAB has no half datatype, and sparse only supports 2-D matrices so cannot be used for batch operations, not that it would be useful anyway since sparse only makes sense for large matrices.
thanks
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 GPU Computing in MATLAB에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
