Newton's method for minimisation returns a critical point
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
I am trying to implement the newton's method to find the minima in the Himmelblau function.
The code does work most of the time, but on cases like this where my initial guess is (0.5 , 1) it returns a critical point of the function. I understand this is because the gradient becomes 0 and no new points are generated.
Now my question would be, is this normal with this method? Is there a way of getting around this problem?
Thanks for any help
close all; clear; clc
% Initialisation of variables to use
x0 = [0.5;1];
tol = 1e-4;
maxits = 50;
% Himmelblau function
him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;
% Gradient of the Himmelblau
grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];
% Hessian matrix of the Himmelblau
hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];
% Call to newton's function and displaying our results accordingly
[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);
fprintf ("<strong>Newton's method</strong>\n\n");
switch (flag)
case 0
fprintf ("There was a convergence on f\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
case 1
fprintf ("There was a convergence on x\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
otherwise
fprintf ("There was no convergence\n\n");
end
function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)
x = x0(1); y = x0(2);
r = NaN;
flag = -1;
for iters = 1 : maxits
x_old = [x;y];
x_new = x_old - (ddg(x,y)\dg(x,y));
if norm(dg(x,y)) < tol
flag = 0;
r = x_new;
return;
end
if norm(x_new - x_old) <= (tol + eps*norm(x_new))
flag = 1;
r = x_new;
return;
end
x = x_new(1);
y = x_new(2);
end
end
채택된 답변
Matt J
2020년 11월 10일
Yes, it's normal.
댓글 수: 30
Is there a way of getting around this problem?
You can make the following change to your update step, but some would argue that it is not strictly Newton's method anymore.
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - (H\dg(x,y));
Also, you really need to add a line search to ensure global convergence, though I suspect that won't be necessary for this specific function.
Thank you, that's perfect. Would that line search be fminbnd with the function on one variable between two arbitrary points?
the line search idea would be to damp the step down H by some value less than 1
x_new = x_old - lam*H\dg(x,y)
If that's what Matt J has in mind, it doesn't seem too meaningful to me since [the modified] H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right? So I don't think it's necessarily guaranteed that doing a line search in a random direction will get you a residual value lower than the current iterate.
I tried this, with multiple vlaues for lam. And it would never converge
Sorry I don't understand what is the heck you guys propose.
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - (H\dg(x,y));
It stucks at (0.5,1) because dg at this points is zero, when you change H to eye, x_new is still == x_old, so it's still stuck.
If you are going to use a constant lam, it must be smaller than one over the largest possible Hessian eigenvalue. An adaptive way to choose lam would be to minimize,
L= @(lam) him(x_old - lam*H\dg(x,y))
@Bruno,
The initial point (0.5,1) is not a point of zero gradient. The problem, though, is that unmodified Newton's method will take the algorithm uphill to a nearby inflection point, since the Hessian at (0.5,1) is not positive definite. When I add the modification (with lam=1), I successfully reach one of the local minima.
Bruno, it actually converges after that. But thinking about it, it's true. The gradient is suppose to be zero and it shouldn't converge, but it does.
@J Alex Lee
H was randomly chosen to not follow the gradient at all; in fact it could have been anything that's positive definite, right?
It's not exactly random. When H is set to eye(2), it is equivalent to doing a step of steepest descent for that one iteration. But H could indeed be anything positive definite as long as its eigenvalues are uniformly bounded to some strictly positive finite interval..
Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.
Well I have to create newton's method with specifically those parameters, so him would never be sent to it. Is there another way?
But just to have more knowledge, after setting the function L. I would have to do a fminbnd of it between 0 and 1?
Matt, i tried with everything below 1. The only time it worked was with something around 1e-5.
It worked fine for me with lam=1, starting from (0.5,1). What failure cases did you observe?
Well I have to create newton's method with specifically those parameters,
Specifically what parameters? Like I said, the modifications we've already made make it such that it's not purely Newton's method anymore.
I would have to do a fminbnd of it between 0 and 1?
In general, it can be any constant interval [0,s] where 0<s<=inf. Obviously if you choose s very small, fminbnd will complete faster, but "Newton's method" convergence can be very slow. Also, note that for this specific objective function, L(lam) is a 1D polynomial, so in theory you can find its exact minimum analytically.
It converges to the critical point or to the maxima. if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?
My function should only receive the gradient, hessia, initial guess, tolerance, and max iterations.
Should't lam be the result of a line search of the function on only one variable?
Something like:
him_x = @(x) him(x,y);
t = fminbnd(him_x,0,s);
But i'm still wondering how does changing the hessian to an identity work if the gradient is the one that results in zero?
Should't lam be the result of a line search of the function on only one variable?
No, it should be a search in the direction you plan to step H\dg(x,y). If you plan to step in the direction of x and y alternately (this is called coordinate descent), you would minimize alternatingly along x and y, but then you wouldn't be exploiting the Hessian at all.
if I chose lam to be anything from 0.1 to 0.9 in intervals of 0.1 all the points that converge from the 1000 random ones are maxima. Maybe this is how it's suppose to be?
You may have to increase maxits, depending on your choice of lam. For me, lam=0.4 gives convergence in 61 iterations.
But i'm still wondering how does changing the hessian to an identity work if the gradient is the one that results in zero?
Making the Hessian positive definite ensures that the direction of the step will be downhill. Otherwise, this is not guaranteed.
Ok, but the problem I've been having when trying to do the line search is that with the function you sent
L= @(lam) him(x_old - lam*H\dg(x,y))
him should be receiving x and y. And the inside is x_new, so it would be the same to do
L = @(lam) him(x_new(1), x_new(2))
Would this make sense?
You get the convergence with the Hessian being put to an identity? Or by doing a line search? Just by adding a step of 0.4 doesn't change anything for me
The following converged:
% Initialisation of variables to use
x0 = [0.5;1];
tol = 1e-10;
maxits = 50000;
% Himmelblau function
him = @(x,y) (x.^2 + y - 11).^2 + (x + y.^2 - 7).^2;
% Gradient of the Himmelblau
grad_him = @(x,y) [[4*x.^3 + 4*x.*y - 42*x + 2*y.^2 - 14];[4*y.^3 + 4*x.*y - 26*y + 2*x.^2 - 22]];
% Hessian matrix of the Himmelblau
hessian_him = @(x,y) [[ 12*x.^2 + 4*y - 42 , 4*x + 4*y ];[ 4*x + 4*y , 12*y.^2 + 4*x - 26 ]];
% Call to newton's function and displaying our results accordingly
[r, iters, flag] = newton_min(grad_him,hessian_him,x0,tol,maxits);
fprintf ("<strong>Newton's method</strong>\n\n");
Newton's method
switch (flag)
case 0
fprintf ("There was a convergence on f\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
case 1
fprintf ("There was a convergence on x\n\n");
fprintf("The minima found is: \n");
disp(r);
fprintf("It took %d iterations.\n\n",iters);
otherwise
fprintf ("There was no convergence\n\n");
end
There was a convergence on x
The minima found is:
3.0000
2.0000
It took 61 iterations.
function [r, iters, flag] = newton_min(dg,ddg,x0,tol,maxits)
x = x0(1); y = x0(2);
r = NaN;
flag = -1;
for iters = 1 : maxits
x_old = [x;y];
H=ddg(x,y);
if any(eig(H)<=0), H=eye(2); end
x_new = x_old - 0.4*(H\dg(x,y));
if norm(dg(x,y)) < tol
flag = 0;
r = x_new;
return;
end
if norm(x_new - x_old) <= (tol + eps*norm(x_new))
flag = 1;
r = x_new;
return;
end
x = x_new(1);
y = x_new(2);
end
end
Oh ok, yes i get convergence with that. Even by taking the step works fine. Thank you
If I were to implement the line search, would that eliminate the need for the identity matrix?
No, you always need some way to ensure that H is positive definite.
Ok thank you, one last question. How is it that when solving H against the gradient (that is zero) the result isn't zero and we keep going down to the minima?
How is it that when solving H against the gradient (that is zero) the result isn't zero
I have never seen that happen. The result will certainly be zero and the algorithm should not proceed any further.
The point where it would get stuck at is a critical point of the Himmelblau, and critical points occur when the gradient is zero. That's why I'm a bit confused.
Yes, if you were to initialize the iterations at any zero-gradient point, including an inflection point, the algorithm would get stuck there. I'm not claiming that wouldn't happen.
That hasn't happened in what we've looked at so far, however, because we have only tested the initial point (0.5,1) which is not a point of zero-gradient:
>> grad_him(0.5,1)
ans =
-30.5000
-41.5000
It's me that is confuse I thoughh the gradient is 0 at the starting point.
Bruno Luong
2020년 11월 11일
편집: Bruno Luong
2020년 11월 11일
In practice gradient-based minimizer first computes the descend direction, it can cen Newton like direction
dk = -H(xk)*g(k)
H is estimated from BFGS formula or by other mean such as conjugate gradient direction.
then do the line search
x(k+1) = x(k) + rho*(dk)
When computing dk, the minimizer always ensure
dot(dk,g) <= 0
so that the linesearch always goes downhill for infinitesimal step. It will automatically satisfies when Hk is positive, such as BFGS approximation. In some other methods that can deal with nonpositive Hessian, the direction is given by for example minimizing quadratic form on the sphere around the current point xk (trust region), this also ensures the 2nd condition is meet. The line search never goes up as with the Newton direction like in your case.
The line search approximative per iteration is enough, something costly as fminbnd is too much. Usualy a couple of evaluations of cost function along descend the direction, then the minimzer performs polynomial fit to find the minimum. It must satisfied some other criiteria such as Wolf's criteria etc... if not the step can be reduced aor increased in a geometrical manner until the criteria meet.
Even there are only a few evaluations per step, it will eventually converges to small-enough step along the iteration (one iteration == one change of dk), since the gradient converges to 0 (first order optimality). Thus the most important thing for an optimiser is to find the right descend direction and there is not need to do exact line search.
Anyway all that is well known and one can read through many books of optimization. A classical book is Nosedal that give an overview. In practice solver differs from technical details , and one have to find the corresponding paper to find out what's going on.
But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).
>> grad_him(-0.1280,-1.9537)
ans =
0.0018
0.0006
Where the method gets stuck
Bruno Luong
2020년 11월 11일
편집: Bruno Luong
2020년 11월 11일
When you think about it, your original implementation of newton method for solving first-order euler eqt
g(x)=0
it doesn't care about if the point x is (local) minimum, maximum or eventually even a saddle point. As written the code is totally symetric to maximizing or minimizing.
So it converges to about local maximum (-0.1280, -1.9537) if you start at (0.5,1). So it actually does well. Just not what you would expect that returns the local minimum, but the code does not incorporate any specific statement on finding a minimum.
Suppose now that you reverse the sign of your objective function him (therefore also H and g accordingly), it will goes exactly the same path and converges to now to the very same solution, but now it's a local minimum!
But that is not the point where the gradient would be zero, it is the critical point (-0.1280, -1.9537).
Yes, but as long as the algorithm goes downhill from (0.5,1) at every iteration, it can never approach the inflection point (-0.1280, -1.9537). The inflection point lies uphill from your initial point:
>> him(0.5,1)
ans =
125.3125
>> him(-0.1280, -1.9537)
ans =
178.3372
Great guys, I got it! Thank you so much
추가 답변 (2개)
J. Alex Lee
2020년 11월 10일
Yes this looks normal, you are only asking to zero the gradient of the function, so naturally that includes non-optimal points where the gradient is [vector] zero.
You can use a non-gradient minimizer, like fminsearch to seek local minima
댓글 수: 1
Thank you, the idea is not to used fminsearch as I am trying to compare newton's method against fminsesarch
Bruno Luong
2020년 11월 10일
편집: Bruno Luong
2020년 11월 10일
"Now my question would be, is this normal with this method?"
Your code juts shows it: yes it is normal.
Now in practice it is very rare that one falls on a stationary point that is not a local minimum. As soon as you work with a non-academic objective function. You won't ever get the gradient == 0 exactly.
"Is there a way of getting around this problem?"
All the book I read about optmization, no one care about this specific problem,since as I said it only happens in academic example. However, many methods will compute for each iteration an approximation of the Hessian, and the positiveness of the Hessian is either enforced or monitored. The Hessian that has negative eigenvalues like yours at (0.5,1) will has automatically a special treatment to escape from non-mimum.
댓글 수: 1
Thank you, good to know
카테고리
도움말 센터 및 File Exchange에서 Parallel Computing Toolbox에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
