Introducing multithread to min()

조회 수: 4 (최근 30일)
Hyoseung Kang
Hyoseung Kang 2019년 6월 29일
댓글: HyoSeung Kang 2019년 7월 5일
[min_val, min_idx] = min(v); would exploit SIMD in matlab. I find that this operation is hard to take advantage in GPU in C/C++ CUDA while I am working on a C/C++ CUDA project, so I got interested in introducing multi CPU core parallelism to the Matlab min function. (It means single core Matlab can min function can be faster than multi core GPU to do the same) cf. I particularly need min_idx rather than only min_val.
1. What is the neatest way to make min() to work in multi core in CPU rather than a single core for a single array? What I can do personally at the moment is only 2 below.
2. Generate C/C++ code from Matlab which does [min_val, min_idx] = min(v);. For a long array in my C++ code, I split the array into number of CPU cores and launch min() from Matlab codegen with the number of CPU cores. It reduces to like 12 elements (number of hyper threads in CPU) only, call a single final min() to finish. The array is assumed to hold like 1e6 or more number of elements in double or single precision.
Thanks for advices.
  댓글 수: 3
Hyoseung Kang
Hyoseung Kang 2019년 7월 3일
If it is not as fast as doing Linear Algebra in GPU, certain operations can be better to execute in CPU. In particular, this helps to consider heterogeneous computing when an algorithm holds both parallelism-philic and parallelism-phobic operations. On elementwise vector addition, subtraction, multiplication, and division, I observe that simple CUDA kernel achieves 5,000 times faster runtime (both in double and float) compared to single core Matlab which exploits the CPU vectorization. On dense matrix-matrix multiplication using CUBLAS library can achieve 1,000 times faster runtime (in double, float can improve further) in the same setting.
I first benchmark different libraries on an operation, and if it doesn't feel very fast, try my own kernel and I am comparing to single core Matlab in any case. The whole project must be in C/C++, so it is the best if Matlab is completely avoidable.
I tried calling Matlab Engine API from C code, but I find that this involves Inter Process Communication. I am not sure if I can avoid overheads if I had a recent release of Matlab (mine is R2016a).
Now I am using vectorized C++ code for this min() operation implementated by someone else. This is not faster than Matlab but it doesn't have overheads. So, I almost finished implementing multithreaded version of this AVX2 vectorization for min() by now.
Conclusion. I may come back to Matlab when I find if it reduces the overhead doing Inter Process Communication with later releases on Matlab Engine API in C/C++. The overhead was engGet(Put)Variable which I cannot avoid calling at each time when I access and modify the variable in the workspace. There was no meaning to implement this in multithread due to the overheads.
HyoSeung Kang
HyoSeung Kang 2019년 7월 5일
Thanks for kind advice Edric Ellis.
Using GPGPU for this operation can be clearly faster than CPU, the number of elements has a great impact on runtime benchmarks.
Unfortunately, if I am coding on my own in C/C++, it is not necessarily recommendable to combine with Matlab Engine.
Depending on which algorithm to implement in Hybrid Computing, one still can execute this operation in CPU.
" For number of elements 1e+8, thrust library (NVidia) compared to Matlab single-core CPU is still only 7-times faster in double-precision. Using Matlab's GPU implementation for this type of operation wouldn't drastically improve like more than 20-times faster. "
If GPU is allowed to do something else while this operation can be executed in CPU ( multi-threading & AVX ), that would be helpful though it is a lot of work. That's what I am doing.

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

제품


릴리스

R2016b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by