이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
Vec-trick implementation (multiple times)
조회 수: 14 (최근 30일)
이전 댓글 표시
Dear all,
the question is related to Tensorproduct. Since the question was not answered as intended, i want to revisit the question.
Introduction:
Suppose you have a matrix vector multiplication, where a matrix C with size (np x mq) is constructed by a Kronecker product of matrices A with size (n x m) and B with size (p x q). The vector is denoted v with size (mp x 1) or its vectorized version X with size (m x p).
In two dimensions this operation can be performed with O(npq+qnm) operations instead of O(mqnp) operations, see Wikipedia.
Expensive variant (in case of flops):
Cheap variant (in case of flops):
Main question:
I want to perform many of these operations at ones, e.g. 2500000. Example: n=m=p=q=7 with A=size(7x7), B=size(7x7), v=size(49x2500000).
In Tensorproduct i have implemented a MeX-C version of the cheap variant which is quite slower than a Matlab version of the expensive variant provided by Bruno Luong.
Is it possible to implement the cheap version in Matlab without looping?
댓글 수: 5
Bruno Luong
2021년 8월 23일
"Is it possible to implement the cheap version in Matlab without looping"
The method
B = matrix_xi.';
A = matrix_eta;
X = reshape(vector, size(A,2), size(B,1), []);
CX = pagemtimes(pagemtimes(A, X), B); % Reshape CX if needed
given in this thread is cheap version in MATLAB without looping! Granted it is no faster than the expensive method, but it's what you ask for, no ?
ConvexHull
2021년 8월 23일
편집: ConvexHull
2021년 8월 23일
Your completely right. I over looked the "pagetimes" version.
I am still suprised about the fact that there is no cheap version which is able to outperform the expensive vectorized Matlab version (actually BLAS version).
I mean, the cheap variant is of order O(2*7*7*7)=O(686), whereas the expensive variant is of order O(7*7*7*7)=O(2401), in case of flops.
Thank you!
ConvexHull
2021년 8월 23일
By the way i tried to optimize the C-code in Tensorproduct, which gave me 20% speed-up, however still not usefull.
Bruno Luong
2021년 8월 23일
Because smaller flops doesn't mean necessary faster. Memory access, cache, thread management are as well important, and which is fatest method probably depends on n=m=p=q.
ConvexHull
2021년 8월 23일
편집: ConvexHull
2021년 8월 23일
Yeah that's definitly the case here.
The main problem is that, if you want to perform the Vec-trick multiple times in a vectorized fashion you have to reorder the datastructure. After applying AX you cannot perform a Matrix-Matrix multiplication directly with B.
Stupid Memory access O.o!
채택된 답변
ConvexHull
2021년 8월 24일
편집: ConvexHull
2021년 8월 25일
Here is a pure intrinsic Matlab version without loops, however with two transpose operations and quite slow.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
C = kron(B,A);
tic
for i=1:n
v1 = reshape(C*reshape(v,49,[]),size(v));
end
toc % Elapsed time is 0.456353 seconds
tic
for i=1:n
v2 = reshape(reshape(B*reshape((A*reshape(v,7,[])).',7,[]),7*2500000,[]).',7,[]);
end
toc % Elapsed time is 3.879752 seconds
max(abs(v1(:)-v2(:)))
% 1.4211e-14
댓글 수: 22
Bruno Luong
2021년 8월 24일
According to my test, your code is slightly slower than pagemtimes method that is provided in other thread.
for i=1:n
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
end
ConvexHull
2021년 8월 24일
That makes sense. I think pagetime will not do much different. However, only recently (R2020b) introduced.
ConvexHull
2021년 8월 24일
편집: ConvexHull
2021년 8월 24일
@Bruno: Do you know a faster way tranposing a matrix in Matlab? :)
Bruno Luong
2021년 8월 24일
Actually MATLAB is smarther than you though, when you do
C = AA * BB.';
Matlab does not call transpose, it call Blas to do matrix multiplication without explicit transposition (no memory copying or moving).
So if you wonder if your code is not efficient because of .', the answer is NO.
ConvexHull
2021년 8월 24일
편집: ConvexHull
2021년 8월 24일
I don't know what you mean.
- The ().' is far the most expensive operation no matter what is being done in the background.
- Reshape is for free.
- The small 7er matrix-matrix multiplication is cheaper than the 49er big one.
- By the way ()' or ().' are nearly same expensive.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
tic
for i=1:n
vv = reshape(v,7,[]); %#ok<*NASGU>
end
toc % Elapsed time is 0.000186 seconds
tic
for i=1:n
vvv = A*vv;
end
toc % Elapsed time is 0.350487 seconds
tic
for i=1:n
vvvv = (vvv).';
end
toc % Elapsed time is 1.682334 seconds
tic
for i=1:n
vvvvv = reshape(vvvv,7,[]);
end
toc % Elapsed time is 0.000181 seconds
tic
for i=1:n
vvvvvvv = B*vvvvv;
end
toc % Elapsed time is 0.347840 seconds
tic
for i=1:n
vvvvvvvv = reshape(vvvvvvv,7*2500000,[]);
end
toc % Elapsed time is 0.000174 seconds
tic
for i=1:n
vvvvvvvvv = (vvvvvvvv).';
end
toc % Elapsed time is 1.470868 seconds
tic
for i=1:n
vvvvvvvvvv = reshape(vvvvvvvvv,7,[]);
end
toc % Elapsed time is 0.000148 seconds
Bruno Luong
2021년 8월 24일
편집: Bruno Luong
2021년 8월 24일
No
CC = BB.'
alone take time. However with
CC = AA*BB.';
there is no transposition happens in the background.
As showed here (timings obtained by running the code on TMW server, R2021a)
AA = rand(49);
BB = rand(49,500000*5);
BBt = BB.';
tic
CC = AA*BB;
toc
Elapsed time is 0.631366 seconds.
tic
CC = AA*BBt.'; % MATLAB does not perform transposition then multiplication
% it does both by a Blas implementation
toc
Elapsed time is 0.631350 seconds.
tic
BBtt = BBt.';
CC = AA*BBtt;
toc
Elapsed time is 1.105448 seconds.
ConvexHull
2021년 8월 24일
편집: ConvexHull
2021년 8월 24일
I understand your remarks. However, my operations are in following order:
reshape(B*reshape(A.')).'
I think the reshape between them is the problem. This makes it expensive.
Bruno Luong
2021년 8월 24일
Oh I see, I missread your code and the transposition is before the reshape.
In the case, yes the tranposition must be carried out explicitly by Matlab. Sorry but I don't know how one can accelerate the transposition.
ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
Suprisingly, even two small 7er matrix-matrix multiplications are slower than one 49er matrix-matrix multiplication.
n=7;m=7;p=7;q=7;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,500000,5);
n = 5;
C = kron(B,A);
tic
for i=1:n
v1 = reshape(C*reshape(v,49,[]),size(v));
end
toc % Elapsed time is 0.456353 seconds
tic
for i=1:n
v2 = reshape(A*reshape(v,7,[]),size(v));
v3 = reshape(B*reshape(v,7,[]),size(v));
end
toc % Elapsed time is 0.683820 seconds
ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
According to https://github.com/MichielStock/Kronecker.jl the vec-trick becomes useful for larger sizes than n=m=p=q=100.
Bruno Luong
2021년 8월 25일
According to my test, it is "cheap" method is faster from size 27.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
end
close all
semilogy(stab, [t1; t3]');
legend('Expensive method', 'Cheap method using pagemtimes');
xlabel('s');
ylabel('time [sec]');
grid on;

ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
Unfortunately, my problem is restricted to sizes of max n=m=p=q=16. Rather smaller.
ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
With the intrinsic method it is nearly the same.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,1000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),s*1000,[]).',s,[]);
t3(i) = toc;
end
close all
semilogy(stab, [t1; t3]');
legend('Expensive method', 'Cheap method using intrinsic operations');
xlabel('s');
ylabel('time [sec]');
grid on;

Bruno Luong
2021년 8월 25일
편집: Bruno Luong
2021년 8월 25일
Your curves do not clearly cross and not coherent with your finding for size = 7.
ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
Could you make a test between pagetimes and the intrinsic one? Currently, I have no pagetimes available.
Would be great!
ConvexHull
2021년 8월 25일
편집: ConvexHull
2021년 8월 25일
Yes perhaps the vector size (=1000) was too small. Note that today i use different computer.
Bruno Luong
2021년 8월 25일
편집: Bruno Luong
2021년 8월 25일
There is an mtimesx in File exchange that you can use for older matlab that does similar task as pagemtimes.
AFAIK pagemtimes is slighly faster.
Bruno Luong
2021년 8월 25일
Results with 3 methods

stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),s*1000,[]).',s,[]);
t2(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
end
close all
semilogy(stab, [t1; t2; t3]');
legend('Expensive method', 'Cheap method using transposition', 'Cheap method using pagemtimes');
xlabel('s');
ylabel('time [sec]');
grid on;
ConvexHull
2021년 8월 25일
Thanks for the effort. There is a small typo in line "v2 = ... s*1000...". But i don't think it would influence the results much.
Bruno Luong
2021년 8월 26일
Add benchmark with mtimesx

Conclusion
- For version before R2020b, use expensive method for s < 44, use mtimesx otherwise;
- For version R2020b or later, use expensive method for s < 27, use pagemtimes otherwise.
stab = 5:5:100;
t1 = zeros(size(stab));
t2 = zeros(size(stab));
t3 = zeros(size(stab));
t4 = zeros(size(stab));
for i = 1:length(stab)
fprintf('%d/%d\n', i, length(stab));
s = stab(i);
n=s;
m=s;
p=s;
q=s;
A = rand(n,m);
B = rand(p,q);
v = rand(m*p,100000);
tic
C = kron(B,A);
v1 = reshape(C*reshape(v,s*s,[]),size(v));
t1(i) = toc;
tic
v2 = reshape(reshape(B*reshape((A*reshape(v,s,[])).',s,[]),[],s).',s,[]);
t2(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v3 = pagemtimes(pagemtimes(A, X), 'none', B, 'transpose');
t3(i) = toc;
tic
X = reshape(v, size(A,2), size(B,1), []);
v4 = mtimesx(mtimesx(A, X), 'N', B, 'T');
t4(i) = toc;
end
close all
semilogy(stab, [t1; t2; t3; t4]');
legend('Expensive method', ...
'Cheap method using transposition', ...
'Cheap method using pagemtimes', ...
'Cheap method using mtimesx');
xlabel('s');
ylabel('time [sec]');
grid on;
Stefano Cipolla
2023년 9월 14일
편집: Stefano Cipolla
2023년 9월 14일
Hi there! May I ask if you are aware of implementation of functions similar to "pagemtimes" but able to work with at least one sparse input? Alternatively do you see any easy workaround? More precisely I need someting like
pagemtimes(A, V)
where A is a nxnxn sparse real tensor and V is a real dense nxn matrix...
Bruno Luong
2023년 9월 14일
편집: Bruno Luong
2023년 9월 14일
@Stefano Cipolla "sparse real tensor"
I'm not aware this native MATLAB class.
But you can put the A as diagonal block of n^2 x n^2 sparse matrix
SA = [A(:,:,1) 0 0 ... 0
0 A(:,:,2) 0 ... 0
...
9=0 0 ... A(:,:,n)]
Do the same expansion for V (with the same block) then solve it
추가 답변 (0개)
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)
