lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

조회 수: 13 (최근 30일)

SA-W 2023년 6월 2일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank

편집: Matt J 2023년 6월 7일

I use lsqnonlin to solve a non-linear data-fitting problem (fitting parameters of a partial differential equation). The minimization problem is

f = ||g^sim(params) - g^exp ||^2

Currently, I optimize 18 parameters and the vectors g^sim and g^exp have 1470 entries, which is a collection of vectors with 147 entries at 10 different times.

The exact parameters (re-identification) are given by

Here is my code:

opts = optimoptions('lsqnonlin', ...
                            'StepTolerance', 1e-9, ...
                            'FunctionTolerance', 1e-9, ...
                            'OptimalityTolerance', 1e-9, ...
                            'MaxIterations', 250,...
                            'SpecifyObjectiveGradient', true, ...
                            'CheckGradients', false);
sol = lsqnonlin(@(params)objFun(params, g_exp), E0, lb, ub, opts);
function [f,J] = objFun(params, g_exp)
    g_sim = ...; %call pde solver
    J = ...; &call pde solver
    
    f = g_sim - g_exp; 
    
    %scaling of f and J is explained later...
    
end

Running lsqnonlin with a given start vector, lsqnonlin returned exitflag=3 after 25 iterations and 26 function calls; The sum of squares is 4.3470e-16 and the firstorderopt 3.2449e-08. Also, the reference parameters from above are perfectly re-identified up to the 6th digit after the decimal point indicating the perfect fit.

However, what I observed is that the Jacobian matrix

J = d g^sim(params) / d params

at all 25 iterations has condition numbers

cond(J) \approx 1e11
rank(J) = 18

This is unconvient since I would like to compute some quality measures like the correlation matrix, which does not really make sense for such high condition numbers (the product J'*J has, consequently, even greater condition numbers).

Also, the optimization fails for some other start vectors which indicates that there might be a problem with my Jacobian or the parameters.

Based on my knowledge, such high condition numbers can be traced back to a bad scaling. Currently, I scale like

w=1/abs(max(g_exp(g_exp~=0)));
W=w*ones(length(g_exp),1);
%scale residual and jacobian
f=W(:).*r(:);
J = J.*W(:);

I also tried different scaling approaches which are very common in my field. So I think the issue is not related to scaling the optimization problem.

As you can see, rank(J)=18 at all iterations, which (if I am not mitstaken) indicates that the parameters are not linearly dependent on each other.

Having all that said, what might be reasons why I have so high condition numbers and what could I try to reduce them? Is my data vector g_exp maybe not appropriate?

I am also wondering why the optimization works very well under these circumstances.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

채택된 답변

Matt J 2023년 6월 2일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank#answer_1249044

편집: Matt J 2023년 6월 2일

MATLAB Online에서 열기

Your scaling isn't really doing anything meaningful, since all residuals are weighted equally. Really, you are just multiplying your Jacobian by a scalar, which cannot change its cond() number, e.g.,

J=rand(3,2);
cond(J)
ans = 8.0366
cond(rand*J)
ans = 8.0366

Also, another scaling you need to consider, maybe even more importantly than the scaling of the residual, are the units of your parameters. This would introduce weights on the columns of your Jacobian, not just the rows.

Another possibility is simply that your problem is over-parametrized, creating a continuum of non-unique solutions. That's not always a big deal, but may account for why the optimization seems to "fail" (in your words) at different initial points. You haven't described what you're interpreting as a failure, but I'm assuming it means you get unexpected results for the parameters.

As you can see, rank(J)=18 at all iterations, which (if I am not mitstaken) indicates that the parameters are not linearly dependent on each other.

Right, but that's often not helpful. You are using the rank() command's default tolerance setting, which may or may not be appropriate for your problem. It's very easy to construct matrices which pass rank()'s default criteria, but shouldn't be considered full rank, e.g.,

A=diag([1e-14,1]); 
cond(A)
ans = 1.0000e+14
rank(A)
ans = 2
rank(A,1e-10)
ans = 1

댓글 수: 39
이전 댓글 37개 표시이전 댓글 37개 숨기기

SA-W 2023년 6월 2일

params.png

What you want in a qualitative sense is for g^sim to be 'comparably sensitive' to changes in each of the parameters. You don't want x(i)+1 to cause a change of 1 while x(j)+1 causes a change of 1e10, for example.

I think exactly this is the problem in my application. Let me try to explain it in more detail:

My parameters are values of a function at given support points. In the attachment, you see a plot of this function where the x-axis has nine support points and the parameters are the y-values at these points. Linear interpolation is applied between the points. This function is evaluated in my finite element program (pde solver) which returns g^sim and J.

As I said, the data vectors g^sim, g^exp are a collection of 10 different times which are vertically appended in a vector:

g^sim = [g^sim_t1; ...;,g^sim_t10]

g^exp = [g^exp_t1,...,g^exp_t10]

Similarly, the Jacobians at the different times are vertically appended:

J = [J_t1; ...; J_t10];

The problem I have is the following: At time t1, the above function is evaluated mainly in the interval [1.0, 1.2]. This means that only the three parameters defined on the points 1.0, 1.1, 1.2 are actually used (activated) when calculating g^sim_t1 and J_t1. (Ideally, J_t1 would be zero at the six columns associated with the non-activated parameters). With increasing time, this interval becomes broader; At time t10, the function is evaluated in the entire interval [0.8,1.6]. This means that all parameters are activated when calculating g^sim_t10 and J_t10. In other words, g^sim_t1 is nearly only sensititve to changes in three of the nine parameters, while g^sim_t10 is sensitive to changes in all parameters. This is due to the physics behind my problem and there is no way for me to circumvent this inequal activation of parameters at different times. But I also observed that I need to collect the vectors at different times to make sure that the re-identification is successful for some start values. Only using g_sim_t10, for instance, proved t0 be less beneficial.

Anyway, I think this suggests a scaling of the Jacobian. IMHO, the "less activated" parameters should be weighted higher, right? Do you have any idea/proposal how such a scaling could look like?

What I could do, for instance, is to count how often the function is evaluated in each interval at every time. Something like: at t1, the function is evaluted 56 times in [1.0 1.2], 87 times in [1.2 1.4], and so forth... But I was not able to establish a scaling out of this information.

But, based on what you told so for, this would most likely improve the condition number of the Jacobian.

SA-W 2023년 6월 5일

MATLAB Online에서 열기

params.png

I see what you mean. The scaling boils down to a variable transformation and the objective function must be evaluated at the new variable z. Let me try to explain why this is difficult to realize in my case

[~,J0]=gsim(x0); 
cond(J0) =
   1.95e11
w=1./vecnorm(J0,1,1) = 
   0.0004  0.0008   0.0006   0.0017   0.0050    0.0042    0.0065    0.0375 
cond(J0.*w) = 
   4.33e+10

Here, x0 is the reference solution that I want to re-identify. As I said, the (here 9) parameters are the interpolation values of a 1d function at given support points (see the attachment). The vector w clearly illustrates that the last parameter (w(9)=0.0375) is less activated in the calculation of gsim(x0) than, for instance, the first parameter (w(1)=0.0004). This is inherent to the physics that I am working on and probably explains why I see large condition numbers.

Based on my understanding, what the scaling z=x.*w does is to increase the values of the less activated parameters such that greater values are assembled in the associated Jacobian columns. However, to make sure that gsim(x) can be evaluated, the function must be convex and the parameters in the same order of magnitude. The vector w above would imply that x(9) is at least two orders of magnitude bigger than x(1),x(2),... . Also, the above w also violates the convexity constraint.

Based on this requirement, do you have any remedy/idea as to how the vector w (z=x.*w) could be constructed? Ideally, I would like to introduce weights on some columns of the Jacobian, but without having to do a variable transformation. But this is not possible, right?

SA-W 2023년 6월 5일

편집: SA-W 2023년 6월 5일

MATLAB Online에서 열기

It looks moot, since the normalization isn't lowering the condition number much at all. Although, you might want to try vecnorm(J0,1,2) or else cond(J0.w,1) so that the norm of the condition number matches the norm used to weight J0.

[~,J0]=gsim(x0); 
cond(J0, 2) = %1-norm does not work for rectangular matrix
   3.75e+10
w1=1./vecnorm(J0,2,1)  %column-wise
cond(J0.*w1, 2) = 
   1.63e+10
w2=1./vecnorm(J0,2,2) %row-wise
cond(J0.*w2, 2) = 
   3.27e+11
   
%your program
[~,idx] = licols(J0); %second column of J0 is removed
cond(J0(:,idx)) = 
   1.58e+10
[~,idx] = licols(J0, 1e-9); %second and last column of J0 are removed
cond(J0(:,idx)) = 
   37.0

The row-wise scaling reduces the condition number even less than column-wise. This makes sense to me since the main problem here is insufficient activation of parameters, which is more visible to column-wise scaling (correct me if I am wrong).

I applied your FEX tool licols, which gives a reasonable condition number if the second and last column of J0 are removed. However, I can not do the optimization without x(2) and x(9) as the values of the convex function at the corresponding two support points are undefined. Would such a strategy make sense at all in your opinion?

" No, it shouldn't. You are initializing at x0./w in the scaled probelm, which means gsim is still being evaluated at the same initial x0 as before. "

Yes, but the problem occurs at intermediate iterations then. Most likely, my pde-solver can not recover from a point z, where z(9) is two or more magnitudes higher than z(1), z(2).

SA-W 2023년 6월 6일

MATLAB Online에서 열기

I would also mention that the scaling by w is not something you've explored exhaustively. Your bounds and linear constraints were never adjusted for w, so that approach was never implemented properly. If you have an optimization problem f(x) s.t. A*x<=b and you make scaling or any other linear transform x=D*z then the reformulated problem has to transform the constraints as well, so that it becomes f(D*z) s.t. (A*D)*z<=b. I believe you negelected to transform A, lb, and ub, appropriately.

Thats true. I should definitely go deeper into this but I have doubts that it works. A simply shift z = x .+ 1 causes my pde solver to return NaN because of a different order of magnitude. I think that, whatever transformation I do, this causes a similar situation.

You could add rows to J, or in other words lengthen g^sim if there are additional residual equations you can come up with.

Based on

w=1./vecnorm(J0,1,1) =

0.0004 0.0008 0.0006 0.0017 0.0050 0.0042 0.0065 0.0375 0.1435

I would expand gsim(x) by

gsim*(x) = [gsim(x); (w(9) - w(1))^2]

but I guess the derivative

d[w(9) - w(1)]/dx

can not be obtained easily, right?

SA-W 2023년 6월 7일

What I wanted to say is "to end up with a SET OF linear systems" to successively solve the pde with a Newton-Raphson scheme.

Anyway, do you have a reference where the procedure of implementing a pde as nonlinear constraint is described? Maybe your own work in case you have done something similar already.

This approach seems to be not so common in the realm of my literature.

Matt J 2023년 6월 7일

편집: Matt J 2023년 6월 7일

No, I don't have a reference, but I think it should be obvious. A PDE, like any other equation, is of the form ceq(x)=0 which is the form that the Optimization Toolbox solvers require for nonlinear equality constraints.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

John D'Errico 2023년 6월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1977249-lsqnonlin-optimization-large-condition-number-of-jacobian-matrix-at-all-iterations-but-full-rank#answer_1249169

편집: John D'Errico 2023년 6월 2일

MATLAB Online에서 열기

@SA-W - A full rank does NOT mean the parameters are not linearly dependent. With that high of a condition number, it often does mean they are VERY nearly dependent, just not exactly so. However, poor choices of units can often cause high condition numbers, and Matt has attempted to tell you exactly that. Listen to what Matt is telling you.

Consider these two matrices:

format long g
small = 1e-11;
A = [1 1+small;1 1]
A = 2×2
                         1             1.00000000001
                         1                         1
B = [1 small/4;1 -small/4]
B = 2×2
                         1                   2.5e-12
                         1                  -2.5e-12
cond(A)
ans = 
          400003843933.334
cond(B)
ans = 
              400000000000

Both matrices have almost identically the same (and very large) condition number. However the A matrix cannot be simply repaired, because the columns are virtually linearly dependent. The B matrix has a problem essentially because of a poor choice of units. The two columns of B are very different. and in fact, are orthogonal to each other. However, the linear algebra will have difficulties with both cases, because the condition number of B is as large as that of A.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

카테고리

Mathematics and Optimization Partial Differential Equation Toolbox General PDEs

Help Center 및 File Exchange에서 General PDEs에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by

lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 39
이전 댓글 37개 표시이전 댓글 37개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

lsqnonlin optimization: large condition number of Jacobian matrix at all iterations, but full rank

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

채택된 답변

댓글 수: 39 이전 댓글 37개 표시이전 댓글 37개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글 수: 39
이전 댓글 37개 표시이전 댓글 37개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기