Replacing certain values in table with NaN based on different conditions

조회 수: 103(최근 30일)
Liza M
Liza M 2021년 8월 16일
댓글: Liza M 2021년 8월 17일
Hello everyone,
I'm basically new to Matlab and seem to be at a loss with my current problem - so any help would be greatly appreciated.
I have some data stored in a table called T that is 724 rows x 365 variables in size.
Now for some variables, I need to exclude extreme values based on different conditions before continuing on with my statistical analysis. But since these don't warrant excluding the whole dataset (row), I would like to replace those values with NaN in my table T.
As of now, this concerns only the variable in column 129, were values should be replaced with NaN if they are equal to or less than 3.1006 as well as column 131 were values should be replaced with NaN if they are equal to or less than 4.8673.
I have tried to get this to work for hours with several variations of logical indexing and am now stuck at this:
extremevalues = (T{:,129} <= 3.1006) | (T{:,131} <= 4.8673);
T(extremevalues) = NaN;
This does not give me any errors - which is the farthest I've come so far - however, it does not replace those extreme value in the respective columns with NaNs (there are no NaNs to be seen in either one of the specified columns in T, extremevalues however gives me several "1"s when quickly scrolling through it).
I've also tried substituting T{:, variablenr} with T.variablename - that didn't change anything either though.
I feel like the solution should be pretty easy, but I can't seem to find it (or am simply not experienced enough to find it).
Thanks very much in advance!

답변(2개)

darova
darova 2021년 8월 16일
Maybe you need to convert your data into matrix first
A = randi(10,5,4)
A = 5×4 table
Var1 Var2 Var3 Var4 ____ ____ ____ ____ 7 8 8 2 1 2 1 5 3 7 1 7 1 5 5 7 2 10 8 4
ind = A(:,1) < 5; % have this error when using 'table'
Operator '<' is not supported for operands of type 'table'.
A(ind,:) = nan
  댓글 수: 1
Liza M
Liza M 2021년 8월 17일
I think that would've worked if my table weren't of a mixed data type (there are other variables besides numeric ones in there as well).
Anyways, I just discovered the rmoutliers function, which does the thing I need with the specified variables..in just a lot less complicated way than what I've tried.
Thanks again regardless for your quick help!

댓글을 달려면 로그인하십시오.


Ive J
Ive J 2021년 8월 17일
For table, you need to directly access the variable names/column indices. so, T(extremevalues) = NaN; won't work.
This may help:
t = splitvars(table(reshape(1:20, 5, 4)))
t = 5×4 table
Var1_1 Var1_2 Var1_3 Var1_4 ______ ______ ______ ______ 1 6 11 16 2 7 12 17 3 8 13 18 4 9 14 19 5 10 15 20
% set values below 8 in column 2 to nan
colN = 2;
cutoff = 8;
t.(colN)(t.(colN) < cutoff) = nan
t = 5×4 table
Var1_1 Var1_2 Var1_3 Var1_4 ______ ______ ______ ______ 1 NaN 11 16 2 NaN 12 17 3 8 13 18 4 9 14 19 5 10 15 20
  댓글 수: 1
Liza M
Liza M 2021년 8월 17일
Hi,
so rmoutliers solved my problem this time around, but this is a good tip for the future, thanks for that!

댓글을 달려면 로그인하십시오.

태그

제품


릴리스

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by