MeX implementation of Tensorproduct multiplications
이전 댓글 표시
Dear all,
I am searching for a more efficient way to calculate tensorproduct multiplications of arbitrary sizes. First of all, i have already searched the web to get inspired by different solutions, e.g. here Tensorproduct at stackexchange:
1.) External Libraries, e.g. Eigen C++.
2.) Pure Matlab solution based in reshape and shiftdim.
However, so far i ended up with a MeX-file implementation with OpenMP parallelization.
--
In the following, i will give a short introduction of the math behind the implementation:
Suppose you have a matrix vector multiplication, where a matrix C is constructed by a Kronecker product of matrices A with size (n x m) and B with size (p x q). More informations can be found here Wikipedia.
In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations.
--
So far so good. In following example the sizes of the arrays are
matrix_xi = 4x4,
matrix_eta = 4x4,
vector = 16x500000x5,
(new version: vector = 16x2500000)
and may be initialized with ones or zeros. Moreover, rather than calculating only one Kronecker matrix vector multiplication, i want to calculate e.g. 500000x5 operations with one call (see the size of vector).
--
My current MeX-C file implementation:
/*************************************************
* CALLING:
*
* out = tensorproduct(A,B,vector)
*
*************************************************/
#include "mex.h"
#include "omp.h"
#define PP_A prhs[0]
#define PP_B prhs[1]
#define PP_vector prhs[2]
#define PP_out plhs[0]
#define n 4
#define m 4
#define p 4
#define q 4
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
const mwSize *dim;
int i,j,k,s,l;
double temp[n*q];
double *A = mxGetPr(PP_A);
double *B = mxGetPr(PP_B);
double *vector = mxGetPr(PP_vector);
dim = mxGetDimensions(PP_vector);
l = dim[1];
PP_out = mxCreateDoubleMatrix(l*n*p,1,mxREAL);
double *out = mxGetPr(PP_out);
#pragma omp parallel for private(i,j,k,s,temp)
for(k=0; k<l; k++){
for(i=0; i<n; i++){
for(j=0; j<q; j++){
temp[i+j*n]=0;
}
}
for(s=0; s<m; s++){
for(i=0; i<n; i++){
for(j=0; j<m; j++){
temp[i+s*n]+=A[i+j*n]*vector[j+s*m+k*m*p];
}
}
}
for(s=0; s<n; s++){
for(i=0; i<p; i++){
for(j=0; j<q; j++){
out[i*n+s+k*n*p]+=B[i+j*p]*temp[j*n+s];
}
}
}
}
}
--
The code can be compiled with:
mex FFLAGS='$FFLAGS -fopenmp -Ofast -funroll-loops' ...
CFLAGS='$CFLAGS -fopenmp -Ofast -funroll-loops' ...
LDFLAGS='$LDFLAGS -fopenmp -Ofast -funroll-loops' ...
COPTIMFLAGS='$COPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
LDOPTIMFLAGS='$LDOPTIMFLAGS -fopenmp -Ofast -funroll-loops' ...
DEFINES='$DEFINES -fopenmp -Ofast -funroll-loops' ...
tensorproduct.c
--
Currently on my Notebook: (Ubuntu 18.04, GCC 7.5.0, 4 Cores)
matrix_xi = ones(4,4);
matrix_eta = ones(4,4);
vector = ones(16,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 8.036490 seconds.
--
A quite simple Matlab implementation without the use of the Kronecker property gives
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 10.489606 seconds.
--
With a different size of n,m,p,q = 7 the difference surprisingly does not change and gets worse (here you have to change the constants in the C-file)
matrix_xi = ones(7,7);
matrix_eta = ones(7,7);
vector = ones(49,500000,5);
n = 50;
tic
for i=1:n
vector_out = reshape(tensorproduct(matrix_xi,matrix_eta,vector),size(vector));
end
toc
% Elapsed time is 31.436541 seconds.
matrix=kron(matrix_xi,matrix_eta);
tic
for i=1:n
vector_out = reshape(matrix*vector(:,:),size(vector));
end
toc
% Elapsed time is 35.829957 seconds.
Luckily I am still faster than Matlab.
--
Note that only n,m,p,q are constants during compile time. To make the code useable for arbitrary sizes i have written a Macro to generate many C-files with different const n,m,p,q which will be included later with a header and a main C-file. This will help the compiler to unrole and inline the code.
This is the most efficient way i have achieved so far. I hope there are some experts here in this forum who can help to optimize these few lines. I am also open for other solutions, e.g. Matlab based or based on external libraries.
Thank you!
댓글 수: 4
Bruno Luong
2020년 12월 6일
편집: Bruno Luong
2020년 12월 6일
"In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations. "
Which operation? Please define clearly what variable you want to compute from what are input variables.
So far so good. In following example the sizes of the arrays are
matrix_xi = 4x4,
matrix_eta = 4x4,
vector = 16x500000x5,
What are the relation with A, B, C, X? on Wikipedia? They are all matrices and none is 3D array. Please use consistent notation to descripte the desired calculation.
ConvexHull
2020년 12월 6일
편집: ConvexHull
2020년 12월 6일
Bruno Luong
2020년 12월 6일
편집: Bruno Luong
2020년 12월 6일
Yes I do know tensor and kron. But your notation is confusing. Actually your second statement
A = matrix_xi
B = matrix_eta
is wrong. Actually it's
B = matrix_xi.';
A = matrix_eta;
And your vector consists of a set of multiple X matrices.
Each reshape(vector(:,p,q), size(A,2), size(B,1)) is an X.
Your description is very confusing.
I knew what you want to compute. For the general reader, the description is careless and confusing.
ConvexHull
2020년 12월 6일
편집: ConvexHull
2020년 12월 6일
채택된 답변
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Particle & Nuclear Physics에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!

