Asked by moeJ
on 8 Apr 2019

I tried using the curve fitting tool, however i had an error saying that 'Data sizes are incompatible.'

dataset= xlsread('ML.xlsx','sheet2')

a=dataset(:,1)

S=dataset(:,3:9)

D= repelem(a(1:end, :), 1, 7)

cftool

Answer by dpb
on 14 Apr 2019

Edited by dpb
on 14 Apr 2019

Accepted Answer

Higher order polynomials aren't going to help here...the following code generated the figure:

for i=3:9

subplot(4,2,i-2);

[~,ix]=sort(SB(:,i));

plot(SB(ix,i),SB(ix,1))

xlabel(sprintf('B%d',i-2))

end

There's virtually no correlation with most of the corollary variables; what there is in pieces breaks down in other areas of every variable.

Just taking a stab; ran one that uses all variables; another with the only two that were staistically significant -- those results are

>> fitlm(SB(:,3:end),SB(:,1))

ans =

Linear regression model:

y ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7

Estimated Coefficients:

Estimate SE tStat pValue

________ _____ _____ ______

(Intercept) 63.33 13.62 4.65 0.13

x1 -29.51 18.92 -1.56 0.36

x2 20.66 15.52 1.33 0.41

x3 4.74 14.23 0.33 0.80

x4 -4.83 15.69 -0.31 0.81

x5 20.96 33.11 0.63 0.64

x6 -0.02 21.87 -0.00 1.00

x7 1.42 16.83 0.08 0.95

Number of observations: 9, Error degrees of freedom: 1

Root Mean Squared Error: 2.49

R-squared: 0.934, Adjusted R-Squared 0.474

F-statistic vs. constant model: 2.03, p-value = 0.495

>> figure

>> LMA=ans;

>> plot(LMA)

>> title('Salinity ~ 1 + B1 +B2 + B3 +B4 + B5 +B6 + B7')

>> LM12=fitlm(SB(:,3:4),SB(:,1))

LM12 =

Linear regression model:

y ~ 1 + x1 + x2

Estimated Coefficients:

Estimate SE tStat pValue

________ ____ _____ ______

(Intercept) 67.74 2.13 31.82 0.00

x1 -20.86 5.90 -3.53 0.01

x2 21.88 3.21 6.82 0.00

Number of observations: 9, Error degrees of freedom: 6

Root Mean Squared Error: 1.33

R-squared: 0.887, Adjusted R-Squared 0.85

F-statistic vs. constant model: 23.6, p-value = 0.00144

>> figure

>> plot(LM12)

>> title('Salinity ~ 1 + B1 +B2')

>> ylim([40 100])

The last puts the plots on the same scale; notice the intervals are much tighter with only two predictors. Whether this would be worth a hoot for future predictions is pretty much pure luck I'd guess...

moeJ
on 15 Apr 2019

I understand, so basically the issue is in the data itself..

Thank you so so so much for your great effort and help! I truly appreciate it.

dpb
on 15 Apr 2019

"... the issue is in the data itself."

Well, yes and no...the specific dataset certainly hasn't much apparent correlation with any simple combination of the B vectors, true. You also don't have much data to go on unless this is just a tiny subset of the whole data set?

In the larger picture, it does appear from the tables that those folks did have a large-enough dataset that they could split between a fitting set and a testing set to check on the model to some extent, anyway. You couldn't do that with any confidence at all here simply for lack of enough data to do so.

However, while it's not possible to say for certain without seeing the whole rationale behind the fitting process undertaken, it still looks to me like the modelling was just "throwing darts" of continuing ad hoc combinations until happened to find something. That is fraught with danger in that while it may work for a given data set, without some rationale behind it, future data may not fit at all. That they did have some verification effort at least makes some effort against that, but it's still not very satisfying that there's any rationale for choosing the model other than chance correlation.

moeJ
on 16 Apr 2019

Sign in to comment.

Answer by dpb
on 9 Apr 2019

Taking a shot that the presumption earlier is the correct one--

xy=xlsread('ML.xlsx','sheet2'); % read the data into array

N=size(xy,2)-1; % there are one fewer y vectors than columns in array

mdl=cell(N,1); % create an empty cell arrray to hold fit results

for i=1:N % for each "y" column

mdl(i)={fitlm(xy(:,1),xy(:,i+1),'purequadratic')}; % fit the quadratic, store in cell array

end

will result in a Nx1 cell array holding the N linearmodel objects. To see each, just dereference the cell content with the curlies (braces). I just did one with a set of randn() values so the coefficients are near zero, but you get the following output by default. See the doc for fitlim and link to the linearmodel properties to see all about it...

>> mdl{1}

ans =

Linear regression model:

y ~ 1 + x1 + x1^2

Estimated Coefficients:

Estimate SE tStat pValue

________ ____ _____ ______

(Intercept) -0.05 0.71 -0.07 0.95

x1 0.03 0.16 0.22 0.83

x1^2 -0.00 0.01 -0.66 0.52

Number of observations: 20, Error degrees of freedom: 17

Root Mean Squared Error: 0.95

R-squared: 0.17, Adjusted R-Squared 0.0722

F-statistic vs. constant model: 1.74, p-value = 0.206

>>

dpb
on 13 Apr 2019

Would need to see some of the typical data and know more about what you're really trying to fit and what the variables are (some things that one can do just don't necessarily make any sense to do)...

Obviously, one should always start off on a fitting expedition by first visualizing the data...

Image Analyst
on 13 Apr 2019

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 4 Comments

## dpb (view profile)

Direct link to this comment:https://kr.mathworks.com/matlabcentral/answers/455150-polynomial-2nd-degree#comment_691496

## Image Analyst (view profile)

Direct link to this comment:https://kr.mathworks.com/matlabcentral/answers/455150-polynomial-2nd-degree#comment_691497

## dpb (view profile)

Direct link to this comment:https://kr.mathworks.com/matlabcentral/answers/455150-polynomial-2nd-degree#comment_693498

## dpb (view profile)

Direct link to this comment:https://kr.mathworks.com/matlabcentral/answers/455150-polynomial-2nd-degree#comment_693718

Sign in to comment.