How to fill in NaNs or <undefined> in data with the mode of each column
조회 수: 4 (최근 30일)
이전 댓글 표시
I have converted a mixed table of both categorical and double arrays into being all columns of type double, via making each category in the categorical arrays a double.
I have a table of 40k rows, and 40 columns. I want to fill in NaNs via replacing each NaN value with the mode value for that column.
I found a clear looping method in R via this link , but couldn't find a simple loop in matlab to do it. inpaint_nans seems to be more focused on interpolation of the data.
knnimpute()
also fails because I can have swathes of up to 1000 rows which are all NaNs (so I need 1200+ neighbours), as well as 40+ columns, so the algorithm has to loop through 40! times which is very slow.
Any ideas?
댓글 수: 0
답변 (1개)
jgg
2015년 12월 22일
편집: jgg
2015년 12월 22일
Select the NaNs and set them to things:
A = [1 2 NaN 4 5; 1 2 3 NaN 5; 1 NaN NaN NaN 5];
m = mode(A,1);
m = repmat(m,size(A,2), 1);
A_f = A;
A_f(isnan(A)) = m(isnan(A));
Looping is not necessary if you use vectorized operations.
Note: if your matrix is very large, the repmat step can be replaced with a for loop over the columns in order to use less memory, but 40k by 40 is not that large, so it should be fine.
댓글 수: 2
jgg
2015년 12월 22일
If you liked this answer, please accept it so other people can see it resolved your problem!
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Distribution Plots에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!