How to Read and plot CSV file and delete infinity values from CSV file

조회 수: 17(최근 30일)
Khalil
Khalil 2020년 2월 20일
편집: per isakson 2020년 3월 1일
Hi
I want to read csv file and later plot it by using some formula.
I used csvread, but it doesn’t work as the file contains text (column title) and infinity values which MATLAB read as ∞.
I tried with readtable it can read (c=readtable('sensor.csv');. But again I can’t delete infinity values from the file. When I try with c(isnan(c))=0; or c(~any(isinf(c))) = 0; it always give error
“Undefined function 'isinf' or ‘isnan’ for input arguments of type 'table'”
Can anyone help how to read large csv file (xlsread doesn’t work, file size>1gb) and delete the infinity values which MATLAB read ∞.
Small section of large file is attached sensor.csv
Using Matlab R2017b

채택된 답변

per isakson
per isakson 2020년 3월 1일
편집: per isakson 2020년 3월 1일
Matlab provide many ways to read your csv-files. Here are two variants based on textscan, Read formatted data from text file or string .
sensor.csv contains "???", which I assume is your "infinity value". AFAIK, there is no way to make textscan() or any other reading function directly convert "???" to the numerical value, Inf.
In the script, R1, textscan() converts "???" to NaN and in a second step replaces NaN by Inf. That's fine if the file doesn't have missing values, since they also would be converted to Inf.
In the script, R2, the entire file is read to a character array, chr. Next all '???' are replaced by 'Inf' and finally textscan() parses chr. R2 requires more memory to store chr.
%% R1
fid = fopen( 'sensor.csv', 'r' );
cac = textscan( fid, '%f%f%f%f%f' ...
, 'HeaderLines',3, 'CollectOutput',true ...
, 'Delimiter',',', 'TreatAsEmpty','???' );
[~] = fclose( fid );
num = cac{1};
% num(isnan(num)) = inf; % why not just keep the NaNs
num(isnan(num)) = 0; % or replace NaNs by zero
display( num )
%% R2
chr = fileread( 'sensor.csv' );
% chr = strrep( chr, '???', 'inf' );
chr = strrep( chr, '???', '0' ); % or replace '???' by '0'
cac = textscan( chr, '%f%f%f%f%f' ...
, 'HeaderLines',3, 'CollectOutput',true ...
, 'Delimiter',',' );
num = cac{1};
display( num )
Both outputs
num =
0 Inf 1.8947 0.19107 -0.015871
4e-06 1.9911 1.9105 0.19107 0
8e-06 1.9911 1.8947 0.19107 0
1.2e-05 1.9751 1.9105 0.19107 0
1.6e-05 1.9911 1.9421 0.17514 0
2e-05 1.9751 1.9262 0.17514 0
2.4e-05 1.9911 1.8947 0.17514 0
2.8e-05 1.9911 1.8947 0.17514 0
3.2e-05 Inf 1.9105 0.19107 0.015871
3.6e-05 1.9911 1.9262 0.19107 0
4e-05 1.9911 1.9105 0.19107 0
4.4e-05 1.9911 1.9262 0.19107 0
4.8e-05 1.9911 1.8947 0.19107 0
>>
That's before I realised that you don't want Inf in your matrix.

추가 답변(3개)

Bhaskar R
Bhaskar R 2020년 2월 20일
opt = detectImportOptions('sensor.csv', 'MissingRule', 'fill', 'NumHeaderLines', 2);
T = readtable('sensor.csv', op);
header = {'Time', 'Sensor_A', 'Sensor_B', 'sensor_C', 'Sensor_D'};
T.Properties.VariableNames = header;
  댓글 수: 1
Khalil
Khalil 2020년 2월 20일
first line giving me error message
"Error using detectImportOptions
'MissingRule' is not a recognized parameter. For a list of valid name-value pair arguments, see the documentation for
detectImportOptions."

댓글을 달려면 로그인하십시오.


the cyclist
the cyclist 2020년 2월 20일
편집: the cyclist 2020년 2월 20일
I find that sometimes with these finicky imports it can be helpful to use the Import Data Tool.
I used that as a basis to make the following import script:
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [4, Inf];
end
% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 5);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.PreserveVariableNames = true;
opts.VariableNames = ["Time", "Sensor A", "Sensor B", "sensor C", "Sensor D"];
opts.VariableTypes = ["double", "double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
sensorDataTable = readtable(filename, opts);
% Convert to numeric
sensorDataArray = table2array(sensorDataTable);
  댓글 수: 2
Walter Roberson
Walter Roberson 2020년 2월 25일
Putting an explicit function header on it:
function sensorDataArray = ReadSensorTable(filename, dataLines)
% If dataLines is not specified, define defaults
if nargin < 2
dataLines = [4, Inf];
end
% Set up the Import Options and import the data
opts = delimitedTextImportOptions("NumVariables", 5);
% Specify range and delimiter
opts.DataLines = dataLines;
opts.Delimiter = ",";
% Specify column names and types
opts.PreserveVariableNames = true;
opts.VariableNames = ["Time", "Sensor A", "Sensor B", "sensor C", "Sensor D"];
opts.VariableTypes = ["double", "double", "double", "double", "double"];
% Specify file level properties
opts.ExtraColumnsRule = "ignore";
opts.EmptyLineRule = "read";
% Import the data
sensorDataTable = readtable(filename, opts);
% Convert to numeric
sensorDataArray = table2array(sensorDataTable);
end

댓글을 달려면 로그인하십시오.


Walter Roberson
Walter Roberson 2020년 2월 21일
https://www.mathworks.com/help/matlab/ref/rmmissing.html can be used since R2016b. That is, you use readtable() and let the ??? be replaced by nan; after that rmmissing will remove the rows that have any nan, which seems to be what you are asking for.
There is also fillmissing() which tries to deduce what reasonable values might be in place of each nan.
  댓글 수: 4
Walter Roberson
Walter Roberson 2020년 2월 25일
I posted a version of his code, above, made into a function.

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by