Solve Nonlinear Problem with Many Variables
This example shows how to handle a large number of variables in a nonlinear problem. In general, for this type of problem:
Use the Low-memory BFGS (LBFGS) Hessian approximation. This option is available in the default
fminunc
andfmincon
algorithms.If you have an explicit gradient, you can also use a finite-difference Hessian and the
'cg'
subproblem algorithm.If you have an explicit Hessian, formulate the Hessian as sparse.
Although not part of this example, if you have a structured problem and can evaluate the product of the Hessian with an arbitrary vector, try using a Hessian multiply function. See Minimization with Dense Structured Hessian, Linear Equalities.
The example uses the hfminunc0obj
helper function shown at the end of this example for the general nonlinear solvers fminunc
and fmincon
. This function is an N-dimensional generalization of Rosenbrock's function, a difficult function to minimize numerically. The minimum value of 0 is attained at the unique point x = ones(N,1)
.
The function is an explicit sum of squares. Therefore, the example also shows the efficiency of using a least-squares solver. For the least-squares solver lsqnonlin
, the example uses the hlsqnonlin0obj
helper function shown at the end of this example as a vector objective function that is equivalent to the hfminunc0obj
function.
Problem Setup
Set the problem to use the hfminunc0obj
objective function for 1000 variables. Set the initial point x0
to –2 for each variable.
fun = @hfminunc0obj; N = 1e3; x0 = -2*ones(N,1);
For the initial option, specify no display and no limit to the number of function evaluations or iterations.
options = optimoptions("fminunc",Display="none",... MaxFunctionEvaluations=Inf,MaxIterations=Inf);
Set up a table to hold the data. Specify labels for the eight solver runs, and define places to collect the run time, returned function value, exit flag, number of iterations, and time per iteration.
ExperimentLabels = ["BFGS_NoGrad", "LBFGS_NoGrad",... "BFGS_Grad", "LBFGS_Grad", "Analytic", "fin-diff-grads",... "LSQ_NoJacob", "LSQ_Jacob"]; timetable = table('Size',[8 5],'VariableTypes',["double" "double" "double" "double" "double"],... 'VariableNames',["Time" "Fval" "Eflag" "Iters" "TimePerIter"],... 'RowNames',ExperimentLabels);
The following code sections create the appropriate options for each solver run, and collect the output directly into the table whenever possible.
BFGS Hessian Approximation, No Gradient
opts = options; opts.HessianApproximation = 'bfgs'; opts.SpecifyObjectiveGradient = false; overallTime = tic; [~,timetable.Fval("BFGS_NoGrad"),timetable.Eflag("BFGS_NoGrad"),output] =... fminunc(fun, x0, opts); timetable.Time("BFGS_NoGrad") = toc(overallTime); timetable.Iters("BFGS_NoGrad") = output.iterations; timetable.TimePerIter("BFGS_NoGrad") =... timetable.Time("BFGS_NoGrad")/timetable.Iters("BFGS_NoGrad");
LBFGS Hessian Approximation, No Gradient
opts = options; opts.HessianApproximation = 'lbfgs'; opts.SpecifyObjectiveGradient = false; overallTime = tic; [~,timetable.Fval("LBFGS_NoGrad"),timetable.Eflag("LBFGS_NoGrad"),output] =... fminunc(fun, x0, opts); timetable.Time("LBFGS_NoGrad") = toc(overallTime); timetable.Iters("LBFGS_NoGrad") = output.iterations; timetable.TimePerIter("LBFGS_NoGrad") =... timetable.Time("LBFGS_NoGrad")/timetable.Iters("LBFGS_NoGrad");
BFGS with Gradient
opts = options; opts.HessianApproximation = 'bfgs'; opts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("BFGS_Grad"),timetable.Eflag("BFGS_Grad"),output] =... fminunc(fun, x0, opts); timetable.Time("BFGS_Grad") = toc(overallTime); timetable.Iters("BFGS_Grad") = output.iterations; timetable.TimePerIter("BFGS_Grad") =... timetable.Time("BFGS_Grad")/timetable.Iters("BFGS_Grad");
LBFGS with Gradient
opts = options; opts.HessianApproximation = 'lbfgs'; opts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("LBFGS_Grad"),timetable.Eflag("LBFGS_Grad"),output] =... fminunc(fun, x0, opts); timetable.Time("LBFGS_Grad") = toc(overallTime); timetable.Iters("LBFGS_Grad") = output.iterations; timetable.TimePerIter("LBFGS_Grad") =... timetable.Time("LBFGS_Grad")/timetable.Iters("LBFGS_Grad");
Analytic Hessian, 'trust-region'
Algorithm
opts = options; opts.Algorithm = 'trust-region'; opts.SpecifyObjectiveGradient = true; opts.HessianFcn = "objective"; overallTime = tic; [~,timetable.Fval("Analytic"),timetable.Eflag("Analytic"),output] =... fminunc(fun, x0, opts); timetable.Time("Analytic") = toc(overallTime); timetable.Iters("Analytic") = output.iterations; timetable.TimePerIter("Analytic") =... timetable.Time("Analytic")/timetable.Iters("Analytic");
Finite-Difference Hessian with Gradient, fmincon
Solver
opts = optimoptions("fmincon","SpecifyObjectiveGradient",true,... "Display","none","HessianApproximation","finite-difference",... SubproblemAlgorithm="cg",MaxFunctionEvaluations=Inf,MaxIterations=Inf); overallTime = tic; [~,timetable.Fval("fin-diff-grads"),timetable.Eflag("fin-diff-grads"),output] =... fmincon(fun, x0, [],[],[],[],[],[],[],opts); timetable.Time("fin-diff-grads") = toc(overallTime); timetable.Iters("fin-diff-grads") = output.iterations; timetable.TimePerIter("fin-diff-grads") =... timetable.Time("fin-diff-grads")/timetable.Iters("fin-diff-grads");
Least Squares, No Jacobian
lsqopts = optimoptions("lsqnonlin","Display","none",... "MaxFunctionEvaluations",Inf,"MaxIterations",Inf); fun = @hlsqnonlin0obj; overallTime = tic; [~,timetable.Fval("LSQ_NoJacob"),~,timetable.Eflag("LSQ_NoJacob"),output] =... lsqnonlin(fun, x0, [],[],lsqopts); timetable.Time("LSQ_NoJacob") = toc(overallTime); timetable.Iters("LSQ_NoJacob") = output.iterations; timetable.TimePerIter("LSQ_NoJacob") =... timetable.Time("LSQ_NoJacob")/timetable.Iters("LSQ_NoJacob");
Least Squares with Jacobian
lsqopts.SpecifyObjectiveGradient = true; overallTime = tic; [~,timetable.Fval("LSQ_Jacob"),~,timetable.Eflag("LSQ_Jacob"),output] =... lsqnonlin(fun, x0, [],[],lsqopts); timetable.Time("LSQ_Jacob") = toc(overallTime); timetable.Iters("LSQ_Jacob") = output.iterations; timetable.TimePerIter("LSQ_Jacob") =... timetable.Time("LSQ_Jacob")/timetable.Iters("LSQ_Jacob");
Examine Results
disp(timetable)
Time Fval Eflag Iters TimePerIter ______ __________ _____ _____ ___________ BFGS_NoGrad 110.44 5.0083e-08 1 7137 0.015475 LBFGS_NoGrad 53.143 2.476e-07 1 4902 0.010841 BFGS_Grad 35.491 2.9865e-08 1 7105 0.0049952 LBFGS_Grad 1.2056 9.7505e-08 1 4907 0.0002457 Analytic 7.0991 1.671e-10 3 2301 0.0030852 fin-diff-grads 5.217 1.1422e-15 1 1382 0.003775 LSQ_NoJacob 94.708 3.7969e-25 1 1664 0.056916 LSQ_Jacob 6.5225 3.0056e-25 1 1664 0.0039197
The timing results show the following:
For this problem, the LBFGS Hessian approximation with gradients is the fastest by far.
The next fastest solver runs are
fmincon
with a finite difference of gradients Hessian, trust-regionfminunc
with analytic gradient and Hessian, andlsqnonlin
with analytic Jacobian.The
fminunc
BFGS algorithm without gradient has similar speed to thelsqnonlin
solver without Jacobian. Note thatlsqnonlin
requires many fewer iterations thanfminunc
for this problem, but each iteration takes much longer.Derivatives make a large difference in speed for all solvers.
Helper Functions
The following code creates the hfminunc0obj
helper function.
function [f,G,H] = hfminunc0obj(x) % Rosenbrock function in N dimensions N = numel(x); xx = x(1:N-1); xx_plus = x(2:N); f_vec = 100*(xx.^2 - xx_plus).^2 + (xx - 1).^2; f = sum(f_vec); if (nargout >= 2) % Gradient G = zeros(N,1); for k = 1:N if (k == 1) G(k) = 2*(x(k)-1) + 400*x(k)*(x(k)^2-x(k+1)); elseif (k == N) G(k) = 200*x(k) - 200*x(k-1)^2; else G(k) = 202*x(k) - 200*x(k-1)^2 - 400*x(k)*(x(k+1) - x(k)^2) - 2; end end if nargout >= 3 % Hessian H = spalloc(N,N,3*N); for i = 2:(N-1) H(i,i) = 202 + 1200*x(i)^2 - 400*x(i+1); H(i,i-1) = -400*x(i-1); H(i,i+1) = -400*x(i); end H(1,1) = 2 + 1200*x(1)^2 - 400*x(2); H(1,2) = -400*x(1); H(N,N) = 200; H(N,N-1) = -400*x(N-1); end end end
The following code creates the hlsqnonlin0obj
helper function.
function [f,G] = hlsqnonlin0obj(x) % Rosenbrock function in N dimensions N = numel(x); xx = x(1:N-1); xx_plus = x(2:N); f_vec = [10*(xx.^2 - xx_plus), (xx - 1)]; f = reshape(f_vec',[],1); % Vector of length 2*(N-1) % Jacobian if (nargout >= 2) G = spalloc(2*(N-1),N,3*N); % Jacobian size 2*(N-1)-by-N with 3*N nonzeros for k = 1:(N-1) G(2*k-1,k) = 10*2*x(k); G(2*k-1,k+1) = -10; G(2*k,k) = 1; end end end