이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
fmincon optimization: is the first order optimality very sensititve to changes in the step tolerance?
조회 수: 10 (최근 30일)
이전 댓글 표시
SA-W
2023년 11월 27일
I use fmincon interior-point algorithm to fit parameters to a pde.
Here are my basic settings:
opts = optimoptions('fmincon', ...
'StepTolerance', 1e-12, ...
'FunctionTolerance', 1e-12, ...
'OptimalityTolerance', 1e-12, ...
'MaxIterations', 250,...
'SpecifyObjectiveGradient', true, ...
'CheckGradients', false);
lb = zeros(9,1);
ub = 4 + zeros(9,1);
Aineq = ... ; % entries have dimension 1e2
bineq = zeros(9,1);
problem = createOptimProblem( ...
params.solverName, ...
'objective', myFun, ...
'x0', startVec, ...
'lb',lb, ...
'ub',ub, ...
'Aineq', Aineq,
'bineq', bineq,
options=opts);
%create multistart object
ms = MultiStart('Display', 'iter', ...
'UseParallel', true, ...
'StartPointsToRun', 'all', ...
'FunctionTolerance', 0);
% run
run(ms, problem, myStartPoints)
Thre are nine parameters and I have lower bounds and upper bounds as well as linear inequality constraints.
I scaled the matrix Aineq by 1e2 manually such that fmincon pays more attention to feasability. I am aware that this comes with poor convergence and other drawbacks, but it proved to work quite well so far. The reason to choose those tight tolerances (1e-12) is to work around flat regions of the objective function, if any.
Using these options, I get the following output from multistart:
The solution of all 10 runs is
x = [0 0.00838947 0.0167789 0.0251684 0.0335579 0.0419473 0.0503368 0.0587263 0.0673571]
All solutions have exitflag=2 (probably because of the brutal scaling) and the same value of the objective function. Also, the first-order optimality is small.
However, run index = 4, for instance, converged to the same solution, but the first-order optimality is rather big compared to the other ones.
This gets even more visible if I relax all my tolerances (step, function, first-order-optimality tolerance) to the default value of 1e-6:
The solutions is nearly the same as before
x = [0 0.00838945 0.0167789 0.0251684 0.0335578 0.0419473 0.0503367 0.0587262 0.067357]
however, the first-order optimality is higher by several orders of magnitudes, but the solution is nearly unchanged which is also indicated by the sum of squares.
Those high optimalities make the solution less trustworthy.
How is it possible that the optimalities are so different if the sum of squares as well as the solution are practically identical?
댓글 수: 7
Harald
2023년 11월 27일
Hi,
I suppose you are solving the PDE numerically? In that case, please consider
This explains that you may also want to set FiniteDifferenceStepSize and the reasons behind doing so. Perhaps, this will also help with your original question.
Best wishes,
Harald
SA-W
2023년 11월 27일
Hello,
yes, I solve the pde numerically but I provide an analytical gradient to fmincon (see my options). Not sure what happens with the hessian, but for the gradient, there is no finite-differencing.
SA-W
2023년 11월 27일
I know. I am just looking for a qualitative assessment.
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Torsten
2023년 11월 27일
편집: Torsten
2023년 11월 27일
Are you sure the results from the solution of the PDE are exact enough to build an optimization upon them ? Most often - if parameters are to be determined from the solution of ODEs or PDEs - the sensitivities cannot be computed with sufficient reliability.
Torsten
2023년 11월 27일
I suggest you compute the objective function near the point that MATLAB computes as optimal by changing each parameter separately while holding the other parameters constant and see what kind of curves you get.
채택된 답변
Matt J
2023년 11월 27일
편집: Matt J
2023년 11월 27일
So given the results I show, can we qualititatively say that the objective is likely to be very flat at the solution? Or something else?
Well, it basically means that a small change in x (near the stopping point) produces a large change in the gradient. The function would seem to have very high curvatures there, or possibly has a discontinuous first derivative.
댓글 수: 25
SA-W
2023년 11월 27일
All linear constraints are active at the solution and they are continously differentiable. Also, I think the objective function is continuously differentiable near the solution.
The function would seem to have very high curvatures there
How would you proceed to figure this out? The parameter space is nine-dimensional and visualizing anything is probably hard.
Matt J
2023년 11월 27일
편집: Matt J
2023년 11월 27일
Here's a qualitatively similar example. You can see the first order optimalities are again very different without changing the solution very much.
for tol=[1e-12,1e-6]
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
f=@(z)sum( z.^2.*(z<0)+z.*(z>=0));
[x,fval,eflag,stats]=fmincon(@(x) f(x-[1,2,3])+3,rand(1,3)-0.5,[],[],[],[],[],[],[],opts)
end
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 38
funcCount: 244
constrviolation: 0
stepsize: 1.6412e-12
algorithm: 'interior-point'
firstorderopt: 0.7603
cgiterations: 47
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-12, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
x = 1×3
1.0000 2.0000 3.0000
fval = 3.0000
eflag = 2
stats = struct with fields:
iterations: 23
funcCount: 147
constrviolation: 0
stepsize: 1.0967e-06
algorithm: 'interior-point'
firstorderopt: 8.0766e-05
cgiterations: 33
message: 'Local minimum possible. Constraints satisfied.↵↵fmincon stopped because the size of the current step is less than↵the value of the step size tolerance and constraints are ↵satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization stopped because the relative changes in all elements of x are↵less than options.StepTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 0.000000e+00, is less than options.ConstraintTolerance = 1.000000e-06.'
bestfeasible: [1×1 struct]
Matt J
2023년 11월 27일
How would you proceed to figure this out?
You would compute the Hessian and see if it has any large entries or large eigenvalues.
SA-W
2023년 11월 27일
You can see the first order optimalities are again very different without changing the solution very much.
Why is this the case here? The function is very smooth and the first derivative is continuous, right?
SA-W
2023년 11월 27일
The discontinuity did not cause any problems in the optimization (same result). That said, it may not be a problem in practice? I mean as long as the converged solution is the same, I do not have to worry about the discontinuity.
Matt J
2023년 11월 27일
The interior point algorithm theoretically assumes a twice continuously differentiable function and constraints, but much of the time it does work when the continuity is only piecewise.
SA-W
2023년 11월 27일
You would compute the Hessian and see if it has any large entries or large eigenvalues.
hessian =
1.0e+07 *
0.0546 0.0014 -0.0286 -0.0145 -0.0104 -0.0072 -0.0181 0.0244 -0.0016
0.0014 0.0042 0.0019 0.0014 0.0009 0.0018 0.0041 -0.0150 -0.0007
-0.0286 0.0019 0.0166 0.0086 0.0063 0.0046 0.0138 -0.0253 0.0021
-0.0145 0.0014 0.0086 0.0052 0.0016 0.0106 -0.0200 0.0355 -0.0284
-0.0104 0.0009 0.0063 0.0016 0.0064 -0.0186 0.0739 -0.1343 0.0742
-0.0072 0.0018 0.0046 0.0106 -0.0186 0.1034 -0.3381 0.6104 -0.3670
-0.0181 0.0041 0.0138 -0.0200 0.0739 -0.3381 1.1674 -2.1213 1.2382
0.0244 -0.0150 -0.0253 0.0355 -0.1343 0.6104 -2.1213 3.8685 -2.2429
-0.0016 -0.0007 0.0021 -0.0284 0.0742 -0.3670 1.2382 -2.2429 1.3263
This is the hessian for the solution I mentioned in my question. The entries of the hessian differ only by one order of magnitude.
The smallest eigenvalue is 0.0939 and the largest 6.5e+7. This makes the condition number of the hessian large, but this has probably to do with the units of my parameters.
The function would seem to have very high curvatures there,
Do you think this could be the case here? Admittedly, I have never looked at the Hessian so far but, compared to the parameters (between 0 and 1), the second derivatives are very large.
Matt J
2023년 11월 27일
편집: Matt J
2023년 11월 27일
The curvatures don't seem large enough to account for the difference we see. However, looking at the hessian won't detect a discontinuous first derivative. As illustrated below, the Hessian explodes only if you take the Hessian right at the point of discontinuity.
f=@(z)z.^2.*(z<0)+z.*(z>=0);
x=linspace(-2,2,1e6);
dx=x(2)-x(1);
Hess=gradient(gradient(f(x),dx),dx);
plot(x,Hess,'.-'); xlabel x; ylabel('\nabla^2f(x))')
SA-W
2023년 11월 27일
The curvatures don't seem large enough to account for the difference we see.
They are in the order 1e7. Is that not big? Or what was your criterion to come to that statement?
Matt J
2023년 11월 27일
편집: Matt J
2023년 11월 27일
I take it back. It probably is big enough. The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing, assuming the first order optimality numbers are comparable to the gradient magnitudes at the solution.
SA-W
2023년 11월 27일
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Maybe I miss something. The changes in the first-order optimality are at least ~1e5 (they reduced from 1e-5 to 1e0, roughly). So what did you mean by
~1e-1, which is largely what you are seeing
?
Matt J
2023년 11월 27일
편집: Matt J
2023년 11월 27일
they reduced from 1e-5 to 1e0, roughly
To me, it looks like the average FOOM across the different multistarts is initially about 1e-6 and then, after relaxing the tolerances, the average is about 1e-1. Therefore, the change is,
change = 1e-1 - 1e-6
change = 0.1000
which is still ~1e-1.
SA-W
2023년 11월 27일
The changes in x you are seeing are ~1e-6, so with 2nd derivatives that are ~1e7, then you can see changes in the first derivatives of ~1e-6*1e7 = ~1e-1, which is largely what you are seeing,
Did you mean ~1e-6*1e7 = ~1e+1? Of course, exact numbers are nonsense here, but just to make sure you did not refer to something else with ~1e-1.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size? The only benefit I see from this is that the first-order optimality is closer to zero which makes the results more trustworthy but the effect on the solution is negligible.
Matt J
2023년 11월 28일
편집: Matt J
2023년 11월 28일
Did you mean ~1e-6*1e7 = ~1e+1?
Yes, that's what I really meant.
More important from a practical viewpoint: If I know my objective has large curvatures, is it reasonable to choose a smaller step size?
You mean a smaller StepTolerance? It can definitely make a difference. Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
x0=[787.5362 617.2179 596.9010 784.4406 5.9046];
for p=[2,7]
p,
for tol=[1e-6,1e-12];
opts = optimoptions('fmincon', ...
'StepTolerance', tol, ...
'FunctionTolerance', tol, ...
'OptimalityTolerance', tol, ...
'MaxIterations', inf,'MaxFunctionEvaluations', inf,...
'SpecifyObjectiveGradient', false, ...
'CheckGradients', false,'Display','none');
[x,fval,eflag,stats]=fmincon(@(x) norm(x+5,p)^p,x0,...
[],[],[],[],[0,0,0,0,0],[],[],opts);
distance_to_true_solution = norm(x)
end
end
p = 2
distance_to_true_solution = 4.4737e-07
distance_to_true_solution = 4.4723e-09
p = 7
distance_to_true_solution = 0.5628
distance_to_true_solution = 8.1778e-14
SA-W
2023년 11월 28일
편집: SA-W
2023년 11월 28일
Here's an example, where we make the curvatures both small p=2 and large p=7, and you can see that, for the latter, the tolerance makes a big difference in how close you come to the true solution at x=0:
What we see here is what I expected for small curvatures (flat objectives) too. There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives? Small step tolerance leads to more iterations, but lets forget about that.
Matt J
2023년 11월 28일
편집: Matt J
2023년 11월 28일
There, the gradients are small and the optimizer may think it arrived at a minimum and stops to early. Increasing the step tolerance may helps to approach closer to the minimum.
No, I don't think so. In that situation, you would want to tighten the optimality tolerance and maybe also the function tolerance.
From that, can we say that a small step tolerance makes sense if I have no idea about the dimension of second derivatives?
Yes, but it may not be enough. If you have no idea about the second derivatives and they happen to be very low, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
SA-W
2023년 11월 28일
Yes, but it may not be enough. If you have very low curvatures, you may need to make the optimality tolerance extra-small as well (see above). Generally speaking, tightening all of the tolerances always reduces the risk of premature stopping.
That said, would you say setting all tolerances to 1e-12 is reasonable? My parameters are all greater zero and less than 5, I expect the sum of squares to be between 1 and 100.
Matt J
2023년 11월 28일
편집: Matt J
2023년 11월 28일
1e-12 is what I usually use. But you really can't look to the stopping tolerances and exit statistics alone to gain confidence in an optimization's performance. You should be testing on several simulated versions of the problem where you know the desired answer, and seeing how close the optimizer gets.
SA-W
2023년 11월 28일
I always perform numerical experiments first where I know the exact solution, and perform a multistart simulation with several start points. If this works, I move on to the real experiment and perform again a multistart run.
My sanity check is to verify that all local solver runs converged to the same solution. Since I do not know the exact solution, I can not do much more. Or can you recommend some nice checks here? I mean if you get a solution from optimiztion where you do not know the exact solution, how do you convince yourself that the solution is correct if the exitflags are not trustworthy enough?
Matt J
2023년 11월 28일
I don't do anything. I rely on experience with the numerical tests for my confidence.
SA-W
2023년 11월 28일
편집: SA-W
2023년 11월 28일
Do you know tools (file exchange, toolbox functions,...) to postprocess optimization results?
Often, people calculate confidence intervals, R-coefficients, correlation matrices,... Of course, one could implement this self, but in case there are already enclosed packages, why not using them.
Matt J
2023년 11월 28일
편집: Matt J
2023년 11월 28일
There is this thread,
As you will see in the comments though, it will probably require you to do your own Hessian calculation. It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
SA-W
2023년 11월 28일
It must be the Hessian of the Lagrangian, not the objective function, although I guess if you only have linear constraints, they will be the same thing.
Yes, I think so too.
Do you think it makes sense to calculate correlations,etc,... with a hessian that has a condition number ~1e7?
추가 답변 (0개)
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)