How to use isoutlier based in a part of the data?

조회 수: 13 (최근 30일)
Mariana
Mariana 2023년 3월 7일
편집: Bruno Luong 2023년 3월 8일
Good morning everybody,
I have a vector of datas. Like this,
a =[0;0.0028;0.0002;0.0039;0.0061].
As you see, since the 4° element, the values start growing more until the end.
I was trying to determine a threshold to define the 4° and 5° elements as ouliers using 'isoutlier' function from Matlab. I did it. But I had to define a fixed 'ThresholdFactor'value using one of the methods the function has.
I would like the 4° and 5° values to being identified as outliers. Not based with all the vector datas, but because they are bigger than the 1°, 2° and 3° elements. I mean, I would like to find the outliers based on the backforward datas [0;0.0028;0.0002].
The vector I posted is an example. The size must be generic.
Can you help me?
P.S. (Actualized): As I said, depending of the data entries, my vectors gonna have different sizes. But in all cases, the phenomenum they represent, makes the vector values would be bigger at the end.
I can't find a way to define when the datas gonna be outliers since the vector will not always be the same. I need to generalize. So what I really need is to identify when the values start growing until reach the end. For instance, for my example, it would happen from the 4° position.
I hope I could explain better here.
  댓글 수: 5
Mariana
Mariana 2023년 3월 7일
Thank you, Antonios
I'm gonna try it and back here to comment how it goes.
Mariana
Mariana 2023년 3월 7일
편집: Mariana 2023년 3월 7일
The method does not work for the following vector
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113].
But it helped me to solve other problems.
Thanks a lot.

댓글을 달려면 로그인하십시오.

채택된 답변

Mathieu NOE
Mathieu NOE 2023년 3월 7일
hello
why not using islocalmin ? seems to me what you want is to keep the first 3 points (corresponding to a local min)
a =[0;0.0028;0.0002;0.0039;0.0061];
id = find(islocalmin(a(1:end)));
a_keep = a(1:id)
a_keep = 3×1
0 0.0028 0.0002
plot(a)
hold on
plot(a_keep,'dr')
  댓글 수: 2
Mariana
Mariana 2023년 3월 7일
편집: Mariana 2023년 3월 7일
This worked.
I tried with another vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=islocalmin(c);
resul=[c d] % I can see the local min. I'm interested in the last local min.
d=find(d); % I get the local min positions
d=d(end); % Getting the last local min position
threshold=max(c(1:d)); % Is the threshold i was looking for in general way
Thank you very much all of you.
Mathieu NOE
Mathieu NOE 2023년 3월 8일
My pleasure !

댓글을 달려면 로그인하십시오.

추가 답변 (4개)

Antonios Dougalis
Antonios Dougalis 2023년 3월 7일
Hi,
I am not sure if i got it right. You can simply index in the region you are intersted in your array 'a' when using isoutlier
A = [1:100] % make example array A
A(5) = 1000; % put at 5th index the value 1000
A(50) = 1000; % put at 50th index the value 1000
B = isoutlier(A); % will return both outliers in logical array at positions 5 to 50
C = isoutlier(A(1:10)) % will return the first outlier only at position 5
  댓글 수: 1
Mariana
Mariana 2023년 3월 7일
Hi Antonio,
Thank you for your answer.
I just explained the problem better in the first comment above.

댓글을 달려면 로그인하십시오.


Fifteen12
Fifteen12 2023년 3월 7일
Your question is a little complex, as the definition of an outlier is not very well defined. For instance, in your vector, the second element a(2) is more than 10x larger than the following element a(3). Is it an outlier? Only you can really tell that. To tell if any generic element in a vector is an outlier you need to establish a clear definition of what you consider to be an outlier. The definition MATLAB uses for isoutlier (as the default option) is if the element is 3 standard deviations away from the median of the set, but you can change this definition using the method call.
It's a relatively simple task to deconstruct how isoutlier does this, which might help you in customizing your outlier approach.
a = [0;0.0028;0.0002;0.0039;0.0061]; %Sample vector
med = median(a); %Find the median
MAD = median(abs(a - med)); %Median Absolute Deviation: https://www.mathworks.com/help/matlab/ref/filloutliers.html#bvml247
dist = abs(a - MAD); %Distance from each element in a from the MAD
outliers = dist > 3*MAD; %boolean array where 1's indicate a number that was 3 MAD's away from the median
Using this method, none of the elements are outliers. But you can adjust the cutoff for a outlier and make it more sensitive. Hope this helps!
  댓글 수: 1
Mariana
Mariana 2023년 3월 7일
Jhon,
Thank you very much for your help. I already had read the definition of the 'outlier' function, and my problem is that my vector changes as I change the system I'm approaching. But always, this generic vector a, will increase their values at the end. So I can't define one threshold, I need the thresholding changing as the data entries change.
I detailed better in the first comment above.

댓글을 달려면 로그인하십시오.


Les Beckham
Les Beckham 2023년 3월 7일
편집: Les Beckham 2023년 3월 7일
It seems like what you are wanting to do is to chop off the "increase at the end".
Here is one way to do that by searching backwards through a to find where it starts increasing.
a = [0; 0.0028; 0.0002; 0.0039; 0.0061; 0.0062]; % added an extra point to verify logic
last_index = 1 + numel(a) - find(diff(flip(a)) > 0) % find where a stops increasing at the end (working backwards)
last_index = 3
plot(a)
hold on
plot(a(1:last_index),'r*')
grid on
  댓글 수: 2
Mariana
Mariana 2023년 3월 7일
Hi Les,
I think this could solve my problem too. I'm gonna try it and back here!
Thank you so much.
Mariana
Mariana 2023년 3월 7일
Les,
It does not work with the following vector,
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];

댓글을 달려면 로그인하십시오.


Bruno Luong
Bruno Luong 2023년 3월 8일
편집: Bruno Luong 2023년 3월 8일
Not sure, you are not better to describe what you want than most people; what you cann outlier seems to be point that violate the increasing trend:
c=[0.0025;0.0025;0.0025;0.0024;0.0026;0.0025;0.0026;0.0027;0.0028;0.0026;0.0028;0.0047;0.0045;0.0055;0.0071;0.0082;0.0084;0.0113];
d=diff(c);
i=find(d<0);
close all
plot(c); hold on; plot(i,c(i),'or',i+1,c(i+1),'*r')

카테고리

Help CenterFile Exchange에서 Linear Algebra에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by