How to fill in missing data?

조회 수: 9 (최근 30일)
J K
J K 2016년 11월 26일
답변: Star Strider 2024년 10월 7일
Hello everybody,
I have a dataset(txt file) which contains some missing values (represented with 0). I would like to replace all this 0 places with numbers. How can I do?

답변 (2개)

Ayush
Ayush 2024년 10월 7일
Hi,
One way to replace missing values currently represented as 0 is to use interpolation. Start by identifying the indices of the zeros in your dataset. Then, apply the “interp1” function to perform the interpolation. Refer to the pseudo code below for a better understanding:
% Step 1: Load the data
data = readmatrix('your_dataset.txt');
% Step 2: Interpolate to replace zeros
for col = 1:size(data, 2)
x = 1:size(data, 1); % Indices of the data
y = data(:, col); % Data values
% Find indices of non-zero and zero elements
nonZeroIndices = y ~= 0;
zeroIndices = y == 0;
% Perform interpolation only if there are non-zero elements
if any(nonZeroIndices)
% Interpolate only non-zero elements
yInterpolated = interp1(x(nonZeroIndices), y(nonZeroIndices), x(zeroIndices), 'linear', 'extrap');
% Replace zeros with interpolated values
y(zeroIndices) = yInterpolated;
end
% Update the column in the dataset
data(:, col) = y;
end
% Step 3: Save the modified data back to a file (optional)
writematrix(data, 'modified_dataset.txt');
For more information on using the “interp1” function, please refer to the documentation below:

Star Strider
Star Strider 2024년 10월 7일
If you have R2016b, use the fillmissing function (introduced in R2016b) —
T1 = array2table(randi([0 9],10,5)) % Original
T1 = 10x5 table
Var1 Var2 Var3 Var4 Var5 ____ ____ ____ ____ ____ 2 8 7 7 1 8 7 5 6 7 8 4 4 5 3 4 3 3 1 3 4 6 5 5 7 0 4 7 0 4 8 2 2 7 2 9 7 3 0 9 5 0 5 8 3 2 2 6 3 4
loc = table2array(T1) == 0 % Logical Matrix Of ‘0’ Locations
loc = 10x5 logical array
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
T1 = fillmissing(T1, 'linear', MissingLocations=loc, EndValues='nearest') % 'linear' Interpolation Of Missing Values With 'nearest' For End Values
T1 = 10x5 table
Var1 Var2 Var3 Var4 Var5 ____ ____ ____ ____ ____ 2 8 7 7 1 8 7 5 6 7 8 4 4 5 3 4 3 3 1 3 4 6 5 5 7 6 4 7 6 4 8 2 2 7 2 9 7 3 7.5 9 5 4.5 5 8 3 2 2 6 3 4
There are several methods to fill (interpolate) the missing values. See the documentation for those and other options.
.

카테고리

Help CenterFile Exchange에서 Data Preprocessing에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by