Parallel calculating for fast execution

Question

0 개 추천

new machine performance.png

I wrote a code with two scripts: 1- a function which gives a needed value Tmax. Tmax depends on SIX variable inputs. 2- a script to calculat many other quantities where we need to call the function Tmax. For this we have to do this a lot of times and that needs a lot of time. I am lookin for a way to reduce calculating time.

In my second script, I change all the for loop with parfor loop where it is possible. I saw an amelioration and the time is reduced but not so much. I have a powerful machine and the configuration is attached. I hope be able to divide the execution time by 32 as I have 32 cores. That's not happening and I am wondering why. The points at which I calculate my Tmax are independant, so I think that it is possible to give to each core n/32 points if n is the number of my points= sample size. I ask you to explain this issue. Can we call a function n times in parallel with (n/ 32) times executed for each core?

I put the 2 scripts below:

1- the function :

function T_max = EPO_OILS_SEMIBATCH(Par)
global     UA  Tj0   Taj   F  tadd CHP_initial 
F=Par(1); 
tadd=Par(2);
UA=Par(3);
Taj=Par(4);
Tj0=Par(5);
CHP_initial=Par(6);
tspan=[0:10:10000]; 
y0=[0 CHP_initial 0 ((1-(0.14*CHP_initial)*0.24285)*1000)/18 0.5 1.70 0.00 0.00 0.00 Tj0 0.26]; 
[t, y]=ode23s(@semibatch,tspan,y0); 
T_max = max(y(:,10));
function dydt=semibatch(t,y)
global UA Tj0 Taj F tadd   
if t > tadd
    F=0;
end
dydt=[(1/(1+((1-((y(11)-0.12)/y(11)))/(((y(11)-0.12)/y(11))*9))))*((F/(y(11)-0.12))*(24-y(1))-((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))+((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))+(1-((y(11)-0.12)/y(11)))*(((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7))-((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5)))/((y(11)-0.12)/y(11)));...
    -((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))-(F*y(2)/(y(11)-0.12)); ((-F*y(3)/(y(11)-0.12))+((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))-((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))-((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))-(1-((y(11)-0.12)/y(11)))*(((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)))/((y(11)-0.12)/y(11)));...
    ((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))+((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))-(F*y(4)/(y(11)-0.12))-(1-((y(11)-0.12)/y(11)))*((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))/((y(11)-0.12)/y(11)); -((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5)); -((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6)); ((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))-((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)); (((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)))-((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))-((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))-((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5));...
    ((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))+((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)); (1/(((y(11)-0.12)*1.00+0.12*0.93)*2000))*((-(y(11)-0.12)*(((0.15*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*((y(1)/y(4))^0.5)*(y(1)*y(2)-(1/(0.96*exp((671.157)*((1/y(10))-(0.003298)))))*y(3)*y(4)))*-5580+((0.001*exp(-(2405.581)*((1/y(10))-(0.0029411))))*y(3))*-359000+((0.0009*exp(-(2429.636)*((1/y(10))-(0.0029411))))*y(3))*-163000)-0.12*((((0.00576*exp(-(7409.189)*((1/y(10))-(0.0029411))))*y(3)*y(5))+((0.00437*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(6))+((0.004*exp(-(1804.1857)*((1/y(10))-(0.0029411))))*y(3)*y(7)))*-230000+(((0.000237*exp(-(18041.857)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*((y(1)*y(4))^0.5))+((0.00339*exp(-(5063.74789)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(1)*((y(1)/y(4))^0.5))+((0.0592*exp(-(8419.53331)*((1/y(10))-(0.0029411))))*( 0.0017029)*y(8)*y(3)*((y(1)/y(4))^0.5)))*-90000))+UA*(Tj0-y(10))+24*F*20*(Taj-y(10))); F];

2- the calculating:

tic
n=100;   % n is big (till 1000000 and more)
p=6; 
F_max= 0.002; 
F_min= 0.001; 
tadd_max=1200; 
tadd_min= 600; 
UA_max= 100;  
UA_min= 1;    
Taj_max= 308.15;  
Taj_min=293.15; 
Tj0_max= 343.15;   
Tj0_min= 313.15;   
CHP_initial_max=8;  
CHP_initial_min=2.9; 
sob1 = sobolset(p);
An = net(sob1,n);
Par_max=[F_max tadd_max UA_max Taj_max Tj0_max CHP_initial_max];  
Par_min=[F_min tadd_min UA_min Taj_min Tj0_min CHP_initial_min];  
A=zeros(size(An,1),size(An,2)); 
parfor i=1:size(An,1)
A(i,:)=An(i,:).*(Par_max-Par_min)+Par_min;
end
A;
T_max_A=[];
parfor i=1:n
T_max_A(i)= EPO_OILS_SEMIBATCH(A(i,:));   
end
f_0 = (1/n)*sum(T_max_A)                  
D_T = ((1/n)*sum(T_max_A.^2))- f_0^2

댓글 수: 2
없음 표시 없음 숨기기

Jan 2017년 1월 27일

편집: Jan 2017년 1월 27일

Please use the "{} Code" button for a proper formatting. Currently we cannot run or inspect your code by copy&paste and editing this massive block of code is prone to errors and time consuming. Thanks.

Sergey Kasyanov 2017년 1월 27일

MATLAB Online에서 열기

Can you use code formatting?

Like this:

 A=10;
 B=10;

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Jan 2017년 1월 27일

0 개 추천

Sorry, this will not solve the problem:

For the speed part: Wow, this is a cruel code! I would not dare to simplify it manually. So just some ideas:

sqrt() is cheaper than ^0.5.
There are a lot of terms in the king of exp(a*(1/y(10))-b). Because the exp() function is very expensive, you can try to combine these terms to reduce the number of calls.

By the way, you can omit the square brackets in tspan=[0:10:10000], see why-not-use-square-brackets. But here the saved microseconds will not matter.

if t > tadd F=0; end adds a discontinuity to the integertation. Matlab's integrators handle smooth functions only, see http://www.mathworks.com/matlabcentral/answers/59582#answer_72047 .

I'm wondering, if you can trust the results: Most of the constants have 3 or 5 valid digits only, some have 8. The formula has about 100 terms. Without any analysis I guess, that the cancellation error might dominate the solution.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Answer 2

moulay ELMOUKRIE 2017년 1월 28일

0 개 추천

Thanks for your answers. Your remarks gave me some amelioration but I think we can not be able to get interesting time reduction if we stay focused on the function, because the problem is the number n of calculated images which I need to my results convergence. I thaught that I can calculate many images in the same time. That's what I expected from parfor command. But the machine treats the calculating differently. In contrary, the function is now simplified because I eliminated all the global parameters and I kept just the six variable inputs. I injected all the constants in dydt. I did it step by step and checked each time with some runs.

I attached the program : the file hassanehassane is not complet. The complet calculating needs by 64*time given by hassanehassane! So when n=1000000, that will take 15 DAYS in minium.

If each core can calculat the T max of (n/32) points at the sametime, that will be a big advantage.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Parallel calculating for fast execution

댓글 수: 2
없음 표시 없음 숨기기

답변 (2개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

Community Treasure Hunt

Parallel calculating for fast execution

댓글 수: 2 없음 표시 없음 숨기기

답변 (2개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 2
없음 표시 없음 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기