Loop to replace outliers with NaN extremely slow

조회 수: 6 (최근 30일)
TL
TL 2022년 1월 21일
댓글: Star Strider 2022년 1월 22일
I want to replace outliers with NaN in a large table (> 3 standard deviations from each column's mean) and my code works in principle but is incredibly slow, i.e. still not done after 10 minutes. The table size is about 2000x150. Is there a faster way, maybe without the loop, and could someone tell me what is wrong with my version?
%Version 1: loop through column names
var_list = mytable.Properties.VariableNames(4:140)
for i = 1:length(var_list)
mytable.(var_list{i}) = filloutliers(mytable.(var_list{i}),nan,'mean','ThresholdFactor', 3)
end
%Version 2: loop through column indices
for i = 4:140
mytable(:,i) = filloutliers(mytable(:,i),nan,'mean','ThresholdFactor', 3)
end
  댓글 수: 2
Mathieu NOE
Mathieu NOE 2022년 1월 21일
hello Tanja
just a question : is removing the outliers the "real" need or a smoothing approach would also fit your needs ?
TL
TL 2022년 1월 21일
Hi Mathieu, yes I do need to replace them with NaN, some values are real errors so they can be 10 times higher than the mean and need to be filtered out

댓글을 달려면 로그인하십시오.

채택된 답변

Star Strider
Star Strider 2022년 1월 21일
The way the table addressing is coded is likely the problem.
I’m not certain, however using parentheses () addresses the table (or variables as individual table arrays), while curly braces {} address the variable contents themselves.
So for example
mytable(:,i) =
creates a new table as ‘mytable’ while
mytable{:,i} =
addresses only the contents of the variable.
See the documentations ection on Access Data in Tables for details.
Again, I’m not certain wht the problem is, however experimenting with changing the addressing method could provide a solution.
Also, I’m not certain if the loop is even necessary, since filloutliers appears to work on arrays as well as vectors, and operates on each column separately, according to the documentation.
.
  댓글 수: 4
TL
TL 2022년 1월 22일
Perfect, now I could fix it, thanks so much! If anyone else has the same problem, this works without a loop and within seconds (outliers = 3 standard deviations from mean, any number can be picked here):
k = mytable.Properties.VariableNames % Then delete cells of k that should not be outlier corrected
mytable_filtered = filloutliers(mytable(:,k),nan,'mean','ThresholdFactor', 3)
Star Strider
Star Strider 2022년 1월 22일
As always, my pleasure!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Loops and Conditional Statements에 대해 자세히 알아보기

태그

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by