How to replace outliers with NaN
조회 수: 45 (최근 30일)
이전 댓글 표시
Hello,
I am trying to replace values above the 99th percentile (outliers) by NaN for each group (for both group A and group B) in a table t.
group = repelem(['A' 'B'], 1000)';
val = repelem(1:1000, 2)';
t = table(group, val);
unique_gr = unique(t.group);
for g = 1:length(unique_gr)
sub = t(strcmp(t.group, unique_gr(g, 1)), :);
f = filloutliers(sub.val, 'NaN', 'percentiles', [0 99])
end
Ideas ? Please note that I do not have any toolboxes.
댓글 수: 2
Walter Roberson
2019년 8월 22일
Use unique with three outputs and iterate through the group numbers,
[unique_gr, ~, groupnum] = unique(t.group);
for g = 1 : size(unique_gr,1)
mask = groupnum == g;
t(mask,:) = filloutliers(t(mask,:), nan, 'percentiles', [0 99]);
end
답변 (1개)
Steven Lord
2019년 8월 22일
You can use grouptransform with an anonymous function that calls filloutliers. Let's use your sample data.
group = repelem(['A' 'B'], 1000)';
val = repelem(1:1000, 2)';
t = table(group, val);
This grouptransform call uses the variable group from the table t as the grouping variable. The anonymous function is the same as what you used and Walter each used in your for loops, though I chose to replace it with the double NaN rather than the text 'NaN' like Walter did.
t2 = grouptransform(t, 'group', ...
@(x) filloutliers(x, NaN, 'percentiles', [0 99]));
Let's see what values of val in t were replaced by NaN in t2.
t(isnan(t2.val), :)
By the way you built t, those do look like the top 1% of values for each group.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Preprocessing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!