finding singular outliers in the presence of data with steep changes but not singular

조회 수: 4 (최근 30일)
Finding outliers of the type, that just a singular values significantly sticks out of the data aroung.
But the complicating factor is, that there sometimes are steep changes in the data. But these are embedded in the context of many data following the sudden new trend. That are not singular data points.
I tried
rmoutliers(Data,'movmedian',3)
but that throws out by far too many of the data from the steep changes, not only the singular outliers.
  댓글 수: 7

댓글을 달려면 로그인하십시오.

채택된 답변

Steven Lord
Steven Lord 2022년 12월 6일
I think the outlier detection and removal functions in MATLAB are the right tools for you to use. Choosing the right parameters (detection method and thresholds) can be a challenge. That's one of the purposes for which the Clean Outlier Data task was created.
Open the Live Editor. Read in your data then open the task as per the instructions in the Open the Task section on that documentation page. Then tell the task the data on which it should operate and experiment with the various detection methods and parameters for those detection methods until they detect the points that you want to be considered outliers without ignoring those that look outlier-like but aren't. Once you have the parameters set the way you want, you can look at the code so you can use it for a different but similar data set in the future.
  댓글 수: 2
hans
hans 2022년 12월 6일
The hint to LiveEditor and the Clean Outlier Data task are very helpful. I didn't know about that functionality. I can modify the parameters and see the effect at once. That's very good.
hans
hans 2022년 12월 7일
With the help of LiveEditor and the Clean Outlier Dat task I adapted the parameters to a suitable code. I finally came up with
[cleanedData2,outlierIndices] = filloutliers(Pressure,"linear",...
"movmedian",minutes(10),"ThresholdFactor",20,"SamplePoints",Time);
Thank You

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Mathieu NOE
Mathieu NOE 2022년 12월 6일
hello
this is my result so far
it will not look at the data in the first and last 10% of the time vector so thefocus is on the rafale of peaks in the second half
x = (1:numel(Pressure));
[dy, ddy] = firstsecondderivatives(x,Pressure);
% do not look at first and last 10% (of total signal duration) samples
n_start = round(0.1*numel(Pressure));
n_end = round(0.1*numel(Pressure));
ddy(1:n_start) = 0;
ddy(end-n_end:end) = 0;
ddy = abs(ddy);
threshold = 1;
x_zc = round(find_zc(x,ddy,threshold));
% keep only first and last index to get start / stop index of window
% and make the window a bit larger with
% 100 samples before and after
x_zc = [x_zc(1)-100 x_zc(end)+100];
y_filtered = Pressure ;
y_filtered(x_zc(1):x_zc(end)) = filloutliers(Pressure(x_zc(1):x_zc(end)),'linear','movmean',100);
figure(1);plot(Time,Pressure,'b',Time,y_filtered,'r');
function [Zx] = find_zc(x,y,threshold)
% positive slope "zero" crossing detection, using linear interpolation
y = y - threshold;
zci = @(data) find(diff(sign(data))>0); %define function: returns indices of +ZCs
ix=zci(y); %find indices of + zero crossings of x
ZeroX = @(x0,y0,x1,y1) x0 - (y0.*(x0 - x1))./(y0 - y1); % Interpolated x value for Zero-Crossing
Zx = ZeroX(x(ix),y(ix),x(ix+1),y(ix+1));
end
function [dy, ddy] = firstsecondderivatives(x,y)
% The function calculates the first & second derivative of a function that is given by a set
% of points. The first derivatives at the first and last points are calculated by
% the 3 point forward and 3 point backward finite difference scheme respectively.
% The first derivatives at all the other points are calculated by the 2 point
% central approach.
% The second derivatives at the first and last points are calculated by
% the 4 point forward and 4 point backward finite difference scheme respectively.
% The second derivatives at all the other points are calculated by the 3 point
% central approach.
n = length (x);
dy = zeros;
ddy = zeros;
% Input variables:
% x: vector with the x the data points.
% y: vector with the f(x) data points.
% Output variable:
% dy: Vector with first derivative at each point.
% ddy: Vector with second derivative at each point.
dy(1) = (-3*y(1) + 4*y(2) - y(3)) / (2*(x(2) - x(1))); % First derivative
ddy(1) = (2*y(1) - 5*y(2) + 4*y(3) - y(4)) / (x(2) - x(1))^2; % Second derivative
for i = 2:n-1
dy(i) = (y(i+1) - y(i-1)) / (x(i+1) - x(i-1));
ddy(i) = (y(i-1) - 2*y(i) + y(i+1)) / (x(i-1) - x(i))^2;
end
dy(n) = (y(n-2) - 4*y(n-1) + 3*y(n)) / (2*(x(n) - x(n-1)));
ddy(n) = (-y(n-3) + 4*y(n-2) - 5*y(n-1) + 2*y(n)) / (x(n) - x(n-1))^2;
end
  댓글 수: 1
hans
hans 2022년 12월 7일
Hi Mathieu,
thank You for this very elaborated code. It works very good and can be adapted perfectly to the situation.
But it's a long code, so finally the filloutliers command with the parameters adapted supervised using the LiveEditor is a very comfortable way to use the preconfigured matlab command, which finally worked also well for me.
Thank You again, for looking so intensely into my data !

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Preprocessing에 대해 자세히 알아보기

태그

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by