Finding outliers in a dataset

조회 수: 5 (최근 30일)
Salma fathi
Salma fathi 2022년 8월 2일
답변: Cris LaPierre 2022년 8월 2일
Hello, shown in the image are the plots for the dataset I am having. I am trying to clean out the dataset from outliers so that later on I would use it to train a machine learning model.
but apparently it is considering a lot of important data points as outliers, so is there any other approach I could follow to get rid of the outliers?
the plot on top is the whole dataset and in the bottom is after removing the outliears using the following lines
nonOutliers=rmoutliers(Matrix3, 'mean');
figure(3);tiledlayout(2,1);nexttile;
scatter(Matrix3(:,1),Matrix3(:,2),1);
nexttile;
scatter(nonOutliers(:,1),nonOutliers(:,2),1)
ylim([0 10*10^12])
  댓글 수: 1
Monica Roberts
Monica Roberts 2022년 8월 2일
One thing to consider is, what do you consider outliers when you look at the graph? Right now, MATLAB doesn't seem to be considering the X-values when calculating outliers. You may want to consider splitting your data into chunks and passing it into rmoutliers. I'd start at where the data shoots up and group every ~200 values of x, pass those chunks into rmoutliers, and see what happens.
There are also other parameters you can pass into rmoutliers. For instance, maybe "mean" isn't the best method of detecting outliers for this dataset. Have you tried the others? The 'movmean' or 'movmedian' methods, for instance, might do the chunking I've described.

댓글을 달려면 로그인하십시오.

답변 (1개)

Cris LaPierre
Cris LaPierre 2022년 8월 2일
If you process your data in a live script, consider interactively exploring different ways to detect and remove outliers using the Clean Outlier Data live task. See here:

카테고리

Help CenterFile Exchange에서 Data Import from MATLAB에 대해 자세히 알아보기

제품


릴리스

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by