how to remove outliers in large data sets?

조회 수: 1 (최근 30일)
MUKESH KUMAR
MUKESH KUMAR 2022년 1월 7일
댓글: Image Analyst 2022년 1월 17일
I am unable to open example code of outliers (openExample('matlab/RemoveOutliersInVectorExample') ) and openExample('matlab/DetermineOutliersWithStandardDeviationExample') also.
I had large datasets of power load for three years at 30min interval, I want to remove the outliers poitns which is affecting my forecasting error.
any help to remove the outliers in such datasets would be appreciated.
Thanks
Reference image is attached which shows the outliers datasets in upper side of image, reference to these points the error is also high (as lower side of image).
Thanks again
  댓글 수: 2
Image Analyst
Image Analyst 2022년 1월 7일
How large is the data set? How many gigabytes? Can you attach a smaller set (less than 5 MB) in a .zip file?
MUKESH KUMAR
MUKESH KUMAR 2022년 1월 8일
Not in GB but its five years data having outliers data in patters, i am attaching data file here

댓글을 달려면 로그인하십시오.

채택된 답변

Image Analyst
Image Analyst 2022년 1월 8일
Try this:
data = readmatrix('Copy of data.xlsx');
x = data(:, 1);
y = data(:, 2);
% Plot just the first cycle.
last = round(70000/3)
x = x(1:last);
y = y(1:last);
subplot(2, 1, 1);
plot(x, y, 'b-')
grid on;
title('Showing One Cycle Only')
% Smooth the data.
windowWidth = 2001; % Some large odd number.
smoothY = movmean(y, windowWidth);
hold on;
plot(x, smoothY, 'r-', 'LineWidth', 3)
% Compute difference between actual and smoothed.
diffy = y - smoothY;
subplot(2, 1, 2);
plot(x, diffy, 'b-');
grid on;
% Detect outliers as having a MAD of more than 900
outlierIndexes = abs(diffy) > 900;
% Plot outliers as red dots over the original data.
subplot(2, 1, 1);
hold on
plot(x(outlierIndexes), y(outlierIndexes), 'r.', 'MarkerSize', 7);
% Now remove outliers from x and y
x(outlierIndexes) = [];
y(outlierIndexes) = [];
  댓글 수: 2
MUKESH KUMAR
MUKESH KUMAR 2022년 1월 17일
This is very helpful and I understand the code and applied in my problem
I also tried to fill outliers data like openExample('matlab/DetectOutliersWithSlidingWindowRmoutliersExample')
but the cmd is not working in 2018a version, its applicable on 2018b version.
''rmoutliers" cmd not working
Is there any way to how to fill these outliers with using movmean or movmedian?
Thanks
Image Analyst
Image Analyst 2022년 1월 17일
You must have a version so old that rmoutliers was not in it yet. However you can do it manually. Just smooth the curve and subtract it from your data and threshold like I did. I didn't use rmoutliers.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by