Fastest large SVD computation in multithreaded machine?

조회 수: 10 (최근 30일)
Martin Ryba
Martin Ryba 2022년 1월 25일
답변: Benjamin Thompson 2022년 1월 25일
I have a waveform optimization problem that uses a large SVD in an interative loop. For instance, with my current settings, the matrix is 18800x18937, or roughly 356M elements (complex, either double or single, been experimenting). I have a machine with enough RAM (128 GB) to hold the entire thing; for other portions of the code I get some help using parpool('threads') and a parfor. Given the matrix sizes, I don't see a process pool as an option for other portions of the code.
Watching the utilization, it appears it uses about 6-10 threads for the native SVD, so it's partly parallel; I don't know if this is a Linux processor counting problem because the parpool only generated 10 workers despite it having 20 CPU cores. SPMD and distributed arrays seem to be only for process pools; with threads I don't need to distribute the array in memory.
Any tips on what to try, or is it going as fast as possible already? RdTildeRoot is 18800x18800 and computed once, Z is updated each iteration. From profiling, the SVD on this one line is 90% of the execution time.
[Ubar, ~, Utilde] = svd(RdTildeRoot * Z, 'econ');
  댓글 수: 4
Martin Ryba
Martin Ryba 2022년 1월 25일
Ah, using /proc/cpuinfo, it's an Intel Xeon W-2155 so it's 10 real cores 20 threads. So the parpool probably did the right thing? Or should it perhaps be allowed to go higher? I found with process pools that typically using about 85% of the cores is best. maxNumCompThreads comes back with 10. Is that really what the SVD is using, or are there algorithm limitations? The load average while the SVD is running is generally about 7-8, and top shows a CPU utilization below 50%. The SVD takes on the order of half an hour per iteration, so those averages are pretty stable. I guess I can try upping it at see what happens. Thanks for the advice.
Martin Ryba
Martin Ryba 2022년 1월 25일
Update: upping to 15 at least while it's in the middle (hit pause and then continue) didn't change much. Watching top while it's going, it does appear that the CPU% tops out at 1000% which makes sense. At times during the SVD it drops down to 250% or so, so perhaps there's portions of the algorithm that can't go full rate. Unfortunately given the array size I probably can't push it to the GPU.

댓글을 달려면 로그인하십시오.

답변 (1개)

Benjamin Thompson
Benjamin Thompson 2022년 1월 25일
If you have the parallel computing toolbox and a good NVIDIA GPU, try using gpu Arrays. Try svds if you don't need all the singular values.

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

태그

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by