if ydata is m-by-n, then sum-of-squares should be m-by-1.
The objective function minimized by lsqcurvefit is always just,
sum-of-squares = sum((fun_ND(x,xdata)-ydata).^2 ,'all')
In other words, the shape of your data matrices is irrelevant to the calculation of the objective function. However, in your case, because each row of your matrices happens to have its own independent set of parameters (a,b), the minimization is separable, i.e., the optimum (a,b) only depends on the sum-of-square terms from its corresponding row. But this has nothing to do with lsqcurvefit. It is just a feature of the parametric structure of your specific problem. If you had organized your data and parametric model column-wise instead of row-wise, however, you would get the exact same result.
Another important thing to understand with lsqcurvefit is that it does not work with your xdata, ydata, or fun_ND output in 1001x8 matrix form. It always does a preliminary reshape of all those matrices into 8008x1 column vectors (see also here). Similarly, when you supply your own sparse Jacobian calculation (which you must do to get the advantages of this technique) lsqcurvefit will expect to receive an 8008x2002 matrix.
As one consequence of this, I think it will be better for you to organize your xdata and ydata as 8x1001 matrices instead of 1001x8 (and change fun_ND accordingly). That way, your Jacobians will have a simple, sparse, block diagonal form, where each of the diagonal blocks is 8x2.