I tried using the curve fitting tool, however i had an error saying that 'Data sizes are incompatible.'
dataset= xlsread('ML.xlsx','sheet2')
a=dataset(:,1)
S=dataset(:,3:9)
D= repelem(a(1:end, :), 1, 7)
cftool

댓글 수: 4

dpb
dpb 2019년 4월 8일
What are the other columns? Seven different sets of observations to fit to the same X vector, or what?
Not enough information to know what is wanted/expected to be fit to what, specifically.
Image Analyst
Image Analyst 2019년 4월 8일
You forgot to attach 'ML.xlsx'.
You can only fit one set of y to a set of x. You can't fit 7 y simultaneously to one x.
Why do you think you should get 7 coefficients for a quadratic? You should get only 3 for each set of y values, so 21 total if you have 7 sets of y.
dpb
dpb 2019년 4월 12일
I've no idea what you think
SalMax=repelem(Salinity(1:end, :), 1, 7)
is for or doing but to fit a higher order polynomial with polyfit you just set the desired order; it does everything else automagically.
Attach the data set; the first step in any fitting problem is to visualize the data...we can't do anything with only the response variable.
dpb
dpb 2019년 4월 13일
I'll try to look at the data some this evening; that will help some...meanwhile you've not really yet given any meaningful context to what the other variables are and I have no idea what " a sailinty column and different bands paremeters ( 7 columns), from which i need to generate a predicted salinity and eventually the equation that i would use in GIS" actually means or how that bears upon the problem.
What are "different band parameters"? Without any idea at all of what data are it's tough to have any klew as to what makes any physical sense at all...just because one could find a set of variables and a polyfit of given degree doesn't mean one should.

댓글을 달려면 로그인하십시오.

 채택된 답변

dpb
dpb 2019년 4월 14일
편집: dpb 2019년 4월 14일

0 개 추천

Higher order polynomials aren't going to help here...the following code generated the figure:
for i=3:9
subplot(4,2,i-2);
[~,ix]=sort(SB(:,i));
plot(SB(ix,i),SB(ix,1))
xlabel(sprintf('B%d',i-2))
end
There's virtually no correlation with most of the corollary variables; what there is in pieces breaks down in other areas of every variable.
Just taking a stab; ran one that uses all variables; another with the only two that were staistically significant -- those results are
>> fitlm(SB(:,3:end),SB(:,1))
ans =
Linear regression model:
y ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7
Estimated Coefficients:
Estimate SE tStat pValue
________ _____ _____ ______
(Intercept) 63.33 13.62 4.65 0.13
x1 -29.51 18.92 -1.56 0.36
x2 20.66 15.52 1.33 0.41
x3 4.74 14.23 0.33 0.80
x4 -4.83 15.69 -0.31 0.81
x5 20.96 33.11 0.63 0.64
x6 -0.02 21.87 -0.00 1.00
x7 1.42 16.83 0.08 0.95
Number of observations: 9, Error degrees of freedom: 1
Root Mean Squared Error: 2.49
R-squared: 0.934, Adjusted R-Squared 0.474
F-statistic vs. constant model: 2.03, p-value = 0.495
>> figure
>> LMA=ans;
>> plot(LMA)
>> title('Salinity ~ 1 + B1 +B2 + B3 +B4 + B5 +B6 + B7')
>> LM12=fitlm(SB(:,3:4),SB(:,1))
LM12 =
Linear regression model:
y ~ 1 + x1 + x2
Estimated Coefficients:
Estimate SE tStat pValue
________ ____ _____ ______
(Intercept) 67.74 2.13 31.82 0.00
x1 -20.86 5.90 -3.53 0.01
x2 21.88 3.21 6.82 0.00
Number of observations: 9, Error degrees of freedom: 6
Root Mean Squared Error: 1.33
R-squared: 0.887, Adjusted R-Squared 0.85
F-statistic vs. constant model: 23.6, p-value = 0.00144
>> figure
>> plot(LM12)
>> title('Salinity ~ 1 + B1 +B2')
>> ylim([40 100])
The last puts the plots on the same scale; notice the intervals are much tighter with only two predictors. Whether this would be worth a hoot for future predictions is pretty much pure luck I'd guess...

댓글 수: 7

moeJ
moeJ 2019년 4월 14일
Thank you so so much for your efforts! This is great! but what if we want to try a logarithmic or quadratic regression analysis as well, how would a quadratic regression analysis code look like? and a logarithimc regression analysis? please!
dpb
dpb 2019년 4월 14일
Just change the model(s) -- but how do you think tha's going to help much? The data are pretty-much randomly related as the plots against variables show.
You could do residual plots and see whether there's any sign of trend that a quadratic term might help with, but I'm doubting it well prove to be there.
Your predictors just aren't very good at their intended job, unfortunately, it seems.
If one has a knowledge of what the variables actually are and some physical reason why they should have a relationship to the salinity, then you might be able to build a functional form for a model that carried some of that knowledge along with it -- just blind curve fitting isn't likely to get much further.
moeJ
moeJ 2019년 4월 14일
when i try quadratic regresison it shows me an error saying X and Y vectors must be the same size, how should i make them the same size?
dataset= xlsread('SB.xlsx','sheet1');
x=dataset(:,1);
y=dataset(:,3:9);
coeff = polyfit(y,x,2)% finds coefficients for a 2nd degree line
Y = polyval(coeff, y)
We have a satellite image, where my colleague did something called correction ( geometric correction) on the pixel size of that image, consequently I’ve received that data ( attached in the file earlier), now I need to predict the value of one dependent variable which is (salinity) from the values of seven independent variables (B1 to B7). So i need to perform a regression analysis on this data set and find that predicted value of salinity and the correlation coefficients, consequently i would be able to calculate the regression equation from the data.
I'm hoping i would end up with something similar to the attached picture. This picture contains two tables from a previous research paper, where they tried more than one type of regression analysis on the data that they had and eventually concluded that the quadratic regression gave them the best fit with the highest R-square.
can you help me? please?
Regression Analysis.pngThank you so much again.
dpb
dpb 2019년 4월 14일
y=dataset(:,3:9);
coeff = polyfit(y,x,2)% finds coefficients for a 2nd degree line
Per the documentation, polyfit can only operate on one y vector at a time...
As far as the tables; if there is some reason for thinking combinations of the various bands have some meaning as predictors, then by all means feel free to use them.
You would simply have to transform the independent variable and fit--we can just try their model (presuming same bands/same number)--
>> z1=SB(:,7)./SB(:,5).*SB(:,9);
>> [~,iz]=sort(z1);
>> plot(z1(iz),SB(iz,1))
Doesn't look like it'll do much to me...it has same kinds of issues in combination as do the individual measurements...no great correlation with the desired predicted variable.
But, since we're just throwing darts, anyway, it appears...
>> LMZ1=fitlm(z1,SB(:,1),'purequadratic')
LMZ1 =
Linear regression model:
y ~ 1 + x1 + x1^2
Estimated Coefficients:
Estimate SE tStat pValue
________ _____ _____ ______
(Intercept) 60.54 8.94 6.77 0.00
x1 45.18 55.70 0.81 0.45
x1^2 -45.37 66.92 -0.68 0.52
Number of observations: 9, Error degrees of freedom: 6
Root Mean Squared Error: 3.59
R-squared: 0.181, Adjusted R-Squared -0.0915
F-statistic vs. constant model: 0.665, p-value = 0.549
>> plot(LMZ1)
>> title('Salinity ~ 1 + Z1 + Z1^2; Z=B5/B3*B7')
>>
Neither coefficient is statstically significant, the Rsq values show it as well as overall model F-statsitic. It's a wild-goose hunt you're after here. There's nothing in these data that gives any hint of there being a model that could account for the observed salinity.
It looks to me like the paper was done purely by picking combinations until something happened to be somewhat better for that particular set of observations. That's the thing about statistics, enough random chances and eventually you can show there being a correlation even though that's all it is--chance. This isn't science, it's alchemy at best; more like voodo in reality.
untitled.jpg
moeJ
moeJ 2019년 4월 15일
I understand, so basically the issue is in the data itself..
Thank you so so so much for your great effort and help! I truly appreciate it.
dpb
dpb 2019년 4월 15일
편집: dpb 2019년 4월 15일
"... the issue is in the data itself."
Well, yes and no...the specific dataset certainly hasn't much apparent correlation with any simple combination of the B vectors, true. You also don't have much data to go on unless this is just a tiny subset of the whole data set?
In the larger picture, it does appear from the tables that those folks did have a large-enough dataset that they could split between a fitting set and a testing set to check on the model to some extent, anyway. You couldn't do that with any confidence at all here simply for lack of enough data to do so.
However, while it's not possible to say for certain without seeing the whole rationale behind the fitting process undertaken, it still looks to me like the modelling was just "throwing darts" of continuing ad hoc combinations until happened to find something. That is fraught with danger in that while it may work for a given data set, without some rationale behind it, future data may not fit at all. That they did have some verification effort at least makes some effort against that, but it's still not very satisfying that there's any rationale for choosing the model other than chance correlation.
moeJ
moeJ 2019년 4월 16일
편집: moeJ 2019년 4월 16일
I see I see. Thank you so so much for your immense help over the past few days. I'll have to check back with my faculty members regarding the figures they provided me with before I can proceed, hopefully they can provide me with a large-enough dataset or something to help. Again, I really appreciate your effort, you've been a great help!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2019년 4월 9일

0 개 추천

Taking a shot that the presumption earlier is the correct one--
xy=xlsread('ML.xlsx','sheet2'); % read the data into array
N=size(xy,2)-1; % there are one fewer y vectors than columns in array
mdl=cell(N,1); % create an empty cell arrray to hold fit results
for i=1:N % for each "y" column
mdl(i)={fitlm(xy(:,1),xy(:,i+1),'purequadratic')}; % fit the quadratic, store in cell array
end
will result in a Nx1 cell array holding the N linearmodel objects. To see each, just dereference the cell content with the curlies (braces). I just did one with a set of randn() values so the coefficients are near zero, but you get the following output by default. See the doc for fitlim and link to the linearmodel properties to see all about it...
>> mdl{1}
ans =
Linear regression model:
y ~ 1 + x1 + x1^2
Estimated Coefficients:
Estimate SE tStat pValue
________ ____ _____ ______
(Intercept) -0.05 0.71 -0.07 0.95
x1 0.03 0.16 0.22 0.83
x1^2 -0.00 0.01 -0.66 0.52
Number of observations: 20, Error degrees of freedom: 17
Root Mean Squared Error: 0.95
R-squared: 0.17, Adjusted R-Squared 0.0722
F-statistic vs. constant model: 1.74, p-value = 0.206
>>

댓글 수: 4

moeJ
moeJ 2019년 4월 12일
Thank yo so much sir for answering my question, i tried the linear regression as you said, which is perfect, but i didn't get a high R^2.
i realized my question was not clear before, so i edited it, maybe you can take another look at it, please?
I need to generate a salinity prediction model developed from the field measurements, from band 1 to band 7 ( or the strongest of those bands), where i need to find which bands would reveal a strong relationship between the salinity levels and the reflectance R square ( the higher the R^2 the better) and a low RMSE. I need to test multiple types of regression ( i do not know which is the best one ) to help me generate an equation that i can use in GIS.
thank you so much again.
dpb
dpb 2019년 4월 13일
Would need to see some of the typical data and know more about what you're really trying to fit and what the variables are (some things that one can do just don't necessarily make any sense to do)...
Obviously, one should always start off on a fitting expedition by first visualizing the data...
Image Analyst
Image Analyst 2019년 4월 13일
편집: Image Analyst 2019년 4월 13일
Despite a strong hint from me and a direct request from dpb, you've still not attached your data, 'ML.xlsx'. Why not? Please do so if you want good answers from here on out.
moeJ
moeJ 2019년 4월 13일
kindally find attached the data set.
I need to do a quadratic regression and endup with a formula (to use in GIS), R^2 and a predicted salinity values, please.
thank you so so much for your help again.

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Linear and Nonlinear Regression에 대해 자세히 알아보기

질문:

2019년 4월 8일

편집:

2019년 5월 3일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by