Outlier removal from a matrix
조회 수: 24 (최근 30일)
이전 댓글 표시
I removed the outliers from my dataset with rmoutliers(A,'mean') command. It should remove the data 3 standard deviations from the mean of each column. But when I print the histogram of each column, there are still some data as far as 6 standard deviations away. What do you suggest? Here is my code:
A = rmoutliers(table_data,'mean');
Zscores = zscore(A); %(A is a 50000*12 matrix)
figure
histogram(Zscores(:,2))
In the histogram, there are still some data as far as 6 standard deviations away.
댓글 수: 1
John D'Errico
2022년 10월 11일
help rmoutliers
I had to go to the doc to check your claim that rmoutliers with the 'mean' option does specifically use 3 standard deviations as the cutoff, away from the mean and then it removes the entire row containing that outlier. This is true. But rmoutliers is not a perfect tool, and any such tool can have problems if you dare to push its limits.
x = [ones(1,5),1 + eps,10]
xhat = rmoutliers(x)
xhat == 1
So rmoutliers first removed the 10 as being more than 3 sigma out, but then, since the standard deviation of the first 5 elements is exactly zero, 1+eps is ALSO more than 3 sigma out, and a clear outlier. The point is, if you try hard enough, you can always cause any such adaptive tool to exhibit strange behavior.
But if you want to know what happened, then you need to provide your data. Otherwise, anything is just a wild guess.
Attach it to a comment (not as an answer), in a .mat file.
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Descriptive Statistics에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!