이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
alternatives to gradient-based optimization algorithms like lsqnonlin that handle at least O(10) parameters?
조회 수: 18 (최근 30일)
이전 댓글 표시
SA-W
2023년 2월 24일
I am fitting parameters of a PDE to experimental data utilizing the function lsqnonlin from the optimization toolbox.
I observed that I can successfully optimize 10,...,20 parameters but not more. My objective is, however, to optimize 50 to 100 parameters.
Based on my experience with lsqnonlin (and fmincon and friends,...), it seems that this class of optimization algorithms can handle a small number of parameters (10,...,20) well, but are not appropriate anymore if there are many more parameters.
Fast convergence or computation time are not important to me.
I am coming from an engineering background and do not have deep knowledge about other class of optimization algorithms. That said, I would appreciate if someone could give me some keywords or recommendations regarding alternative optimization algorithms, that are designed for handling a larger number of parameters.
Thank you!
댓글 수: 18
Torsten
2023년 2월 24일
편집: Torsten
2023년 2월 24일
I observed that I can successfully optimize 10,...,20 parameters but not more.
How did you observe this ? The optimizers are not limited in the number of fitting parameters they can manage. If fitting models with up to 100 parameters makes sense depends on the number and quality of your input data.
SA-W
2023년 2월 24일
How did you observe this ?
My experimental data are numerically generated with a given set of parameters. My parameters are the values of a function at given interpolation points. If I distribute, say, 10 parameters with equally spaced interpolation points over the interval [a,b], lsqnonlin perfectly reidentifies the 10 parameters. However, if I distribute 20 parameters over the same interval [a,b], lsqnonlin reidentifies some of the parameters wrong. That is why I think that lsqnonlin is not longer appropriate for my problem.
Matt J
2023년 2월 24일
편집: Matt J
2023년 2월 24일
However, if I distribute 20 parameters over the same interval [a,b], lsqnonlin reidentifies some of the parameters wrong.
How have you assessed whether they are "wrong"? What is the final objective function value, and how does it compare to the objective value of the "right" parameters?
If both solutions have the same objective value (or nearly so), then they must both be considered correct.
SA-W
2023년 2월 24일
How have you assessed whether they are "wrong"? What is the final objective function value, and how does it compare to the objective value of the "right" parameters?
The objective function value is indeed in the order of the objective value corresponding to the right parameters, however, the parameters are clearly wrong which I observed by plotting the function.
The parameters that I am optimizing (the function values) have to represent a convex function in 1d. However, the 20 parameters found by lsqnonlin do not define a convex function, but rather a function which is at certain ranges even decreasing, which, for whatever reasons, is also a minimum of the objective function.
Let me ask two questions:
(i) is there a recommended ratio (just roughly) between number of parameters and equations? Currently, my data vectors have 1500 entries and I optimise 20 parameters.
(ii) I am considering enforcing convexity by having the following constraints on the parameters p(i):
p(i-1)-pf(i)+p(i-1) > 0 for all parameters
Is there a way to retranslate this constraints for lsqnonlin, for instance by adding a penalty term to
f = ||r||^2 + penalty term?
Torsten
2023년 2월 24일
However, the 20 parameters found by lsqnonlin do not define a convex function, but rather a function which is at certain ranges even decreasing, which, for whatever reasons, is also a minimum of the objective function.
Then you must somehow define convexity of the function as a constraint.
Matt J
2023년 2월 24일
Is there a way to retranslate this constraints for lsqnonlin, for instance by adding a penalty term to
If your penalty is quadratic, then it can be implemented just by expanding the residual vector:
r=[weight1*r; weight2*constraint_violation];
f = ||r||^2
SA-W
2023년 2월 24일
If your penalty is quadratic, then it can be implemented just by expanding the residual vector
c(i) =: [p(i-1)-2p(i)+p(i-1)]/h^2 > 0 at interpolation point corresponding to p(i)
This inequality constraint is based on the central FD-approximation of the second derivative of a function. Since my parameters are function values I thought it is a good idea to implement these constraints in the following way
if p(i-1)-2p(i)+p(i-1) >= 0
c(i) = 0;
end
if p(i-1)-2p(i)+p(i-1) < 0
c(i) = p(i-1)-2p(i)+p(i-1);
end
Do you think it is reasonable to enforce convexity in such a way or can you recommend a better solution?
I know that this question is not directly related to Matlab, but I really appreciate your help!
Matt J
2023년 2월 24일
편집: Matt J
2023년 2월 24일
Do you think it is reasonable to enforce convexity in such a way or can you recommend a better solution?
Seems reasonable to me, but I would probably use fmincon instead of lsqnonlin, since then I can directly implement,
p(i-1)-2p(i)+p(i-1) >= 0
as a linear inequaltiy constraint.
Matt J
2023년 2월 24일
If using lsqnon.in, I might instead have,
if p(i-1)-2p(i)+p(i-1) >= 0
c(i) = 0;
end
if p(i-1)-2p(i)+p(i-1) < 0
c(i) = abs(p(i-1)-2p(i)+p(i-1))^1.5;
end
since then c will be a differentiable function of p. lsqnonlin assumes c to be differentiable.
SA-W
2023년 2월 25일
if p(i-1)-2p(i)+p(i-1) >= 0 c(i) = 0; end if p(i-1)-2p(i)+p(i-1) < 0 c(i) = p(i-1)-2p(i)+p(i-1); end
Here, c would be a non-differentiable function wrt the parameters?
SA-W
2023년 2월 25일
Und what term is it that makes c(i) = abs(p(i-1)-2p(i)+p(i-1))^1.5
differentiable for all parameters? The exponent ^1.5 or the abs?
Matt J
2023년 2월 25일
편집: Matt J
2023년 2월 25일
The abs in |t|^1.5 doesn't really matter if t^1.5 is only ever evaluated for positive t . As seen below, they are the same function with or without the abs. The differentiability follows from the fact that both the left and right hand derivatives of c(t)=max(0,|t|^1.5) are zero at t=0. Conversely, c(t)=max(0,t) has mismatched right and left derivatives at t=0.
syms t real
c1=piecewise(t<=0,0, t>0, abs(t)^1.5)
c1 =
c2=piecewise(t<=0,0, t>0, t^1.5)
c2 =
fplot(c1,[-1,1]);hold on
fplot(c2,[-1,1],'x');hold off
Matt J
2023년 2월 26일
편집: Matt J
2023년 2월 26일
You could do that, or even have c(i) = (p(i-1)-2p(i)+p(i-1))^200. However, with larger exponents, the gradient of the cost function become small over a wider neighborhood of c=0. Smaller gradients means slower convergence, and also the chance that the OptimalityTolerance would trigger prematurely. Conversely, with smaller exponents (but stil greater than 1), the gradient becomes less smooth. So, there is a trade-off to be reckoned with.
SA-W
2023년 3월 6일
편집: SA-W
2023년 3월 6일
I implemented
r=[weight1*r; weight2*constraint_violation];
f = ||r||^2
iwith lsqnonlin and figured out that
weight_2 = 1e4 %approximately
is necessary that the ineqaulity constraints are considered at all at intermediate iterations and weight_2 below or above 1e4, respectively, will not include sufficiently the constraint_violation or dominates the residual r.
If I implement the above inequality constraints in fmincon and multiply A*x<=b by 1e4 (this multiplication is necessary for fmincon to pay attention to the constraints), fmincon gives a better solution than lsqnonlin. By better, I mean that the solution is nearly perfect equal to the exact solution. The lsqnonlin solution fulfills the constraints in A, however, the solution is not as accurate.
Anyway, fmincon requires twice or three times as much iterations as lsqnonlin requires to find the solution. That said, I would like to stick lsqnonlin because the evaluation of the objective function is quite expensive in my case.
I read the documentation on the iterior-point algorithm. The way fmincon incorporates the constraints is based on a merit function, which is, of course, different than just expanding the residual vector as I did it.
Admittedly, I do not have enough background in optimization. Can you give me another recommendation (which is closer to the way fmincon incorporates the constraints) to implement my constraints in lsqnonlin?
Thank you!
채택된 답변
Matt J
2023년 2월 24일
편집: Matt J
2023년 2월 24일
it seems that this class of optimization algorithms can handle a small number of parameters (10,...,20) well, but are not appropriate anymore if there are many more parameters.
I am currently using lsqnonlin in optimization problems with 11796480 variables. It converges fine, though in my case the line search operations really do make it much slower than I would like. In any case, 20 variables is definitely not the upper limit.
I am fitting parameters of a PDE to experimental data utilizing the function lsqnonlin from the optimization toolbox.
Make sure you are following the guidelines here,
댓글 수: 9
SA-W
2023년 2월 24일
Thank you for the reference.
Currently, if my pde solver not not solve the pde for parameters p, I programmed a return statement. But based on this article, NaN is the right way to go, is it?
SA-W
2023년 2월 24일
One question regarding the treatment of NaN.
As I said, so far I used a return statement when my pde solver failed to converge for given parameters.
If I use NaN instead, lsqnonlin tries a different parameter set. But if this returns NaN again, is another set of parameters tried? In other words, how is prevented that an endless loop of function evaluations occurs?
Matt J
2023년 2월 24일
Of that I am not certain. I see nothing about it in the documentation.
I'm not sure it matters for you, though, since as discussed above, you have complicated constraints that are better handled with fmincon.
SA-W
2023년 2월 24일
In the link you gave me, it is said that the solvers in lsqnonlin can recover from NaN.
SA-W
2023년 3월 1일
If someone comes across our disussion here: the algorithms in lsqnonlin can also recover from NaN.
추가 답변 (1개)
Matt J
2023년 2월 26일
편집: Matt J
2023년 2월 26일
The parameters that I am optimizing (the function values) have to represent a convex function in 1d.
Another strategy that I would suggest is to first fit a constrained curve to your function samples using this FEX download,
This tool will allow you both to constrain the fit to be convex (using the 'ConcaveUp' option), and also to compute the derivates of the fit. Once you have the curve fit and its derivates, you can substitute them into your ODE to obtain conventional equations in the original set of unknown parameters. You can then solve these equations using lsqnonin, or even a linear solver, depending on the form of your equations.
댓글 수: 7
SA-W
2023년 2월 26일
This approach is not really clear to me yet.
My objective function is the difference between a measured displacement field and a simulated displacement field. The latter depends somehow on the convex 1d function. The nodal values of that function are my parameters to lsqnonlin.
That said, do you think I can use your suggested tool?
Matt J
2023년 2월 27일
I don't know. We haven't seen a full problem description. What is the role of the ODE? Does it describe the 1D convex function or does it describe the displacement field? My ultimate point is that it would be wise to avoid mkaing the objective function depend on repeated calls to an ODE solver, if possible.
SA-W
2023년 2월 27일
편집: SA-W
2023년 2월 27일
The solution of the ODE is the displacement field (computed via Finite-Element-Mathod), which depends on the 1d convex function. And the nodal values of that convex function are the parameters of the optimization, i.e., the parameters should be changed such that measured displacements and simulated displacements match in a least-squares sense. So the start values passed to lsqnonlin represent the initial shape of the convex function. In other words, I do not have samples of that convex function as my goal is to identify this function.
In order to use the linked FEX download, I need to know samples of the convex function, right?
Matt J
2023년 2월 27일
My suggestion is only relevant if you can do a spline fit to the thing your ODE is taking derivatives of.
SA-W
2023년 2월 27일
Would be nice to try that, but let me clarify one point:
Say the 1d convex function (=parameters) is called f, and displacements are denoted as u. Then the objective function is given by
obj. = |u^sim(f) - u^exp|^2
The Jacobian is calculated according to
J = du^sim(f) / df
My suggestion is only relevant if you can do a spline fit to the thing your ODE is taking derivatives of.
Do you mean the Jacobian here? I could do a spline fit to f, but not to the displacements. The displacement is a finite element field.
Matt J
2023년 3월 6일
I don't know enough about finite element analysis to know why you cannot fit a spline to a finite element field.
SA-W
2023년 3월 6일
Would you support me further regarding the issue in my new comment (in the comments under the question)?
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)