Removing NaN in Linear Regression Problem. Error in line 66.

Hello guys,
I am trying to conduct a multivariable linear regression problem. The predictors (X) form a table sized 52824x9.
When trying to remove all the NaN values using this piece of code, included in the regress function:
% Remove missing values, if any
wasnan = (isnan(y) | any(isnan(X),2)); %line 66
havenans = any(wasnan);
if havenans
y(wasnan) = []; %line 69
X(wasnan,:) = [];
n = length(y);
end
At first, I got an error stating:
Undefined function 'isnan' for input arguments of type 'table'.
Error in regress (line 66)
wasnan = (isnan(y) | any(isnan(X),2));
I searched for solutions, and I was able to find one saying that isnan function is not able to access data from tables, and the provided solution was to include the following:
wasnan = (isnan(y{:,:}) | any(isnan(X{:,:}),2));
Now I get an error in line 69 saying the following:
Subscripting a table using linear indexing (one subscript) or multidimensional indexing (three or more subscripts) is not
supported. Use a row subscript and a variable subscript.
If anyone knew how to solve the problem or to provide another solution for accessing data with the isnan function, it would be very much appreciated. I have been trying to solve this problem for some days now.
Many thanks,
Natalia

 채택된 답변

dpb
dpb 2020년 3월 18일
편집: dpb 2020년 3월 20일
You don't pass the table to regress but the variables to be used in the regression -- then you won't run into the issue inside regress.
And you DEFINITELY DO NOT WANT TO BE MUCKING INSIDE THE SUPPLIED REGRESS FUNCTION!!!!
We don't know the function you're trying to fit nor the variable names in your table, but assuming
Y ~ 1 + AX1 + BX2 + ...
for variables X and Y in the table and a linear model plus intercept, then the syntax for regress would be
b=regress(t.X,[ones(height(t),1) t.Y]);
where the table variable is t. Use your table variable name and variable names within the table, of course.
If you have the Curve Fitting Toolbox besides Statistics, I would suggest that the fit function in it is a little more user friendly than the core regress function. Lacking it, see the Alternative Functionality section of the documentation for regress that suggests using LinearModel instead for similar reasons/purposes.
Read the section in the documentation for table on how to address data within a table for the details of using tables and which forms of addressing return the variables as native type, tables, etc., ... But, in particular note that addressing a table variable with parentheses returns another table of the addressed rows and columns within the table which is probably the root cause of your troubles.
x=t(:,1); % returns x as a table all rows of table t, column 1
while
x=t.X; % presuming X is the first column in table t returns X as an array
% or
x=t{:,1}; % returns x as a array -- NB: the "curlies" {} instead of ()

댓글 수: 1

Many thanks, your insight was very helpful. I was able to solve it using LinearModel, following your advice.
Best,
Natalia

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Cris LaPierre
Cris LaPierre 2020년 3월 18일

0 개 추천

Try using rmmissing. It accepts vectors, matrices, cell arrays, tables, and timetables as input.

댓글 수: 5

But the problem OP has is has passed a table to regress instead of the variables from the table...not that it would hurt to do the cleanup externally first, but will still run into a problem elsewhere later on when trying to access x,y...
I'm addressing the code the OP shared - removing NaNs from a table.
Hmmm....that would work outside regress and if the OP did extract that code from the regress function and is trying it elsewhere. Looked to me like was trying to patch regress instead.
But even if so, unless changes the form in which calls regress it'll result in a table and will fail again trying to get around the input check inside regress.
Ah, I didn't realize that code snippet was from the regress function. Yes, don't go changing code inside the function. Use this to clean up your table before passing it to regress.
And yes, regress does not support tables as inputs. Use the dot notation to pass in variables.
Many thanks for your answer as well, I will keep this in mind from now on.
Best,
Natalia

댓글을 달려면 로그인하십시오.

제품

릴리스

R2019a

질문:

2020년 3월 18일

편집:

dpb
2020년 3월 20일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by