Vectorization for multiple inner for loops

Question

0 개 추천

Hello everyone,

This code snippet takes a lot of time with executing the multiple for loops. The loops (a-f) are dependent on the inner loops (i-k) and Matrixx ist also dependent on the outer loop (t). Any suggestions on how to vectorize multiple inner for loops for faster execution? I've tried meshgrid for (a,c,e) and (b,d,f) but that has not been compatible with the size of Bvector.

Thank you!

Nmax=100
tmax=1000
Matrixx=zeros(Nmax+3,Nmax+3,Nmax+3,tmax);
Matrixx(3,2,2,1)=10;
parameter=0.1;
for t=2:tmax
    Bvector=zeros(Nmax+3,Nmax+3,Nmax+3);
    for i=2:Nmax+2            
    for j=2:Nmax+2
    for k=2:Nmax+2
        for a=2:i
        for b=2:i
            for c=2:j
            for d=2:j
                for e=2:k
                for f=2:k 
                    if  (i-2)+(j-2)+(k-2)<=Nmax
                        Bvector(i,j,k)=Bvector(i,j,k)+parameter*Matrixx(a,c,e,t-1)*Matrixx(b,d,f,t-1)
                        % More similar calculations
                    end
                end
                end
            end
            end
        end
        end
    end
    end
    end
    %More Code, Matrixx gets updated with earlier calculated Bvector. Because of the further calculations, this step has to be separate.
    Matrixx(:,:,:,t)=Matrixx(:,:,:,t-1)+(tmax/100).*(Bvector);
end

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Aditya Patil 2020년 9월 23일

편집: Aditya Patil 2020년 9월 23일

0 개 추천

Even if it may appear from the equation that Bvector updates are independent of each other, in reality, they are repeating the same calculations again and again.

The values in the matrix Matrixx are never updated inside the t loop. You can reuse the multiplications done on Matrixx for all B(i,j,k) calculations, using dynamic programming.

Basically, B(i,j,k) is made up of same Matrixx values as that of B(i,j,k - 1), with few addititons. This can be written as,

I might be slightly off on the equation, but the idea remains the same. Try to reuse as much computed values as possible.

The t loop can't be parallelized, as it depends on previous iteration of the loop. There might be other optimization opportunities, such as reducing cache misses, removing any branching that may exists, etc.

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

Aditya Patil 2020년 10월 1일

MATLAB Online에서 열기

Have a look at the following code. It's not complete, as you need to accomodate the situation when j == 2. But this should give you a starter.

In case I am missing something crucial that makes this strategy invalid, let me know.

Nmax=100;
tmax=1000;
Matrixx=zeros(Nmax+3,Nmax+3,Nmax+3,tmax);
Matrixx(3,2,2,1)=10;
parameter=0.1;
for t=2:tmax
    Bvector=zeros(Nmax+3,Nmax+3,Nmax+3);
    for i=2:Nmax+2
        for j=2:Nmax+2
            for k=2:Nmax+2
                
                % For k  == 2, we compute from scratch
                if k == 2
                    Bvector(i,j,k) = Bvector(i,j-1,k); % But what if j = 2? Calculate separately
                    for a=2:i
                        for b=2:i
                            for d=2:j
                                if  (i-2)+(j-2)+(k-2)<=Nmax
                                    Bvector(i,j,k)=Bvector(i,j,k) + parameter*Matrixx(a,j,2,t-1)*Matrixx(b,d,2,t-1);
                                    % More similar calculations
                                end
                            end
                        end
                    end
                    continue;
                end
                
                % For k ~= 2, we reuse computed values
                if  (i-2)+(j-2)+(k-2)<=Nmax
                    Bvector(i,j,k)=Bvector(i,j,k - 1); % when k == 2? Calculate separately.
                    for f=2:k
                        Bvector(i,j,k)=Bvector(i,j,k) + parameter * Matrixx(i,j,k,t-1)  * Matrixx(i,j,f,t-1);
                        % More similar calculations
                    end
                end
            end
        end
        %More Code, Matrixx gets updated with earlier calculated Bvector. Because of the further calculations, this step has to be separate.
        Matrixx(:,:,:,t)=Matrixx(:,:,:,t-1)+(tmax/100).*(Bvector);
    end
end

Sascha 2020년 10월 1일

편집: Sascha 2020년 10월 1일

How do I notice and address cache misses?

Thank you again for your response. I will test it and get back to you / the forum.

Aditya Patil 2020년 10월 1일

Normally when you access memory, it should be in linear fashion. If you access memory locations which are not nearby, then you might be facing high cache miss rates, as for every read, CPU cannot find the next value in cache, and has to fetch it from memory. Next time, again it's not in cache, and so on.

In your case, it seems like you access Matrixx(x,x,f,x) followed by Matrixx(x,x,f + 1,x). These two will have quite some distance between there memory locations, which might be greater than what your cache can hold.

Try reorganizing Matrixx so that the values you access one after another also come one after another in memory.

댓글을 달려면 로그인하십시오.

Vectorization for multiple inner for loops

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Vectorization for multiple inner for loops

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (1개)

댓글 수: 6 이전 댓글 4개 표시 이전 댓글 4개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 6
이전 댓글 4개 표시 이전 댓글 4개 숨기기