How can I remove outliers in my data using Cook's Distance?

조회 수: 3 (최근 30일)
Fatemah Ebrahim
Fatemah Ebrahim 2020년 6월 29일
편집: Fatemah Ebrahim 2020년 6월 29일
I have a large dataset, 6 .'xlsx' files with ~ 400,000 rows each, and I want to use Cook's Distance to determine the outliers in the fourth column of each dataset and then delete the corresponding row. How would I do that?
  댓글 수: 2
Fatemah Ebrahim
Fatemah Ebrahim 2020년 6월 29일
편집: Fatemah Ebrahim 2020년 6월 29일
Hi! So I'm using the code they used on one of the '.xlsx' files as so:
X = A_t; % where this is a datetime value
Y = Adata(:,4); % where we are pulling the fourth column of the table
mdl = fitlm(X,Y);
plotDiagnostics(mdl,'cookd')
find((mdl.Diagnostics.CooksDistance)>3*mean(mdl.Diagnostics.CooksDistance))
And I am getting this error:
Error using classreg.regr.TermsRegression/handleDataArgs (line 550)
Predictor variables must be numeric vectors, numeric matrices, or
categorical vectors.
Error in LinearModel.fit (line 1184)
[X,y,haveDataset,otherArgs] =
LinearModel.handleDataArgs(X,varargin{:});
Error in fitlm (line 121)
model = LinearModel.fit(X,varargin{:});
Please let me know if you have any idea how to address this error, there does not seem to be much information on this. Thanks!

댓글을 달려면 로그인하십시오.

답변 (0개)

카테고리

Help CenterFile Exchange에서 Dimensionality Reduction and Feature Extraction에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by