" Q1. Although the weight for w1 is greatly increased,
there is a very small tendency to minimize the function for J1.
What could be the reason for this? "
Suppose that J1 is essentially a constant function, at least in the vicinity of the start point, or very nearly so? If the optimizer sees there is essentially no gain in the global composite objective coming from J1, then it makes sense to move in a direction that minimizes the other two sub-objectives.
That COULD be the reason. You could look carefully at each of your sub-objectives. At the start point, what are the corresponding norms of their gradients? If the gradient is zero, then how much change from J1 will you get for any movement from x0?
Essentially, you want to look at the start point for each sub-objective. Then, try optimizing each of them independently. (Or, if this is a low dimensional problem, just plot them all.) If they were independent of each other, where would the optimizer want to go for each sub-objective?
Even though the objectives are apparently "normalized", what probably really matters is how the functions are normalized so the gradients all have similar norms. Becaue you could add some HUGE constant to any one of them, and the gradient would not change.
Q2. And is it a good idea to normalize constraints as well?
It can't hurt. If they are wildly different in magnitude, then expect numerical problems.