use of readtable to read unstructured data

조회 수: 3(최근 30일)
Marzuki Marzuki
Marzuki Marzuki 2021년 9월 21일
댓글: Mathieu NOE 2021년 9월 22일
I want to read the data in the attached file, using readtbale given below:
fname1='sample.txt';
opt = detectImportOptions(fname1, 'headerlines', 0);
data = readtable(fname1, opt);
However, the result is not correct because there is a shift of data structure (see below)
I plan get following structure:
Is there any best way to do it? thank you for help

채택된 답변

Mathieu NOE
Mathieu NOE 2021년 9월 21일
why not this ? simply using the import tool
you can ignaore the second variable if it's not meaningfull
result :
code :
out = importfile('sample.txt', 1, 16);
function sample = importfile(filename, startRow, endRow)
%IMPORTFILE Import numeric data from a text file as a matrix.
% SAMPLE = IMPORTFILE(FILENAME)
% Reads data from text file FILENAME for the default selection.
%
% SAMPLE = IMPORTFILE(FILENAME, STARTROW, ENDROW)
% Reads data from rows STARTROW through ENDROW of text file FILENAME.
%
% Example:
% sample = importfile('sample.txt', 1, 16);
%
% See also TEXTSCAN.
% Auto-generated by MATLAB on 2021/09/21 18:32:46
%% Initialize variables.
if nargin<=2
startRow = 1;
endRow = inf;
end
%% Read columns of data as text:
% For more information, see the TEXTSCAN documentation.
formatSpec = '%8s%3s%5s%8s%5s%5s%5s%5s%s%[^\n\r]';
%% Open the text file.
fileID = fopen(filename,'r');
%% Read columns of data according to the format.
% This call is based on the structure of the file used to generate this code. If an error occurs for a different file, try regenerating the code from the Import Tool.
dataArray = textscan(fileID, formatSpec, endRow(1)-startRow(1)+1, 'Delimiter', '', 'WhiteSpace', '', 'TextType', 'string', 'HeaderLines', startRow(1)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for block=2:length(startRow)
frewind(fileID);
dataArrayBlock = textscan(fileID, formatSpec, endRow(block)-startRow(block)+1, 'Delimiter', '', 'WhiteSpace', '', 'TextType', 'string', 'HeaderLines', startRow(block)-1, 'ReturnOnError', false, 'EndOfLine', '\r\n');
for col=1:length(dataArray)
dataArray{col} = [dataArray{col};dataArrayBlock{col}];
end
end
%% Close the text file.
fclose(fileID);
%% Convert the contents of columns containing numeric text to numbers.
% Replace non-numeric text with NaN.
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for col=1:length(dataArray)-1
raw(1:length(dataArray{col}),col) = mat2cell(dataArray{col}, ones(length(dataArray{col}), 1));
end
numericData = NaN(size(dataArray{1},1),size(dataArray,2));
for col=[3,4,5,6,7,8,9]
% Converts text in the input cell array to numbers. Replaced non-numeric text with NaN.
rawData = dataArray{col};
for row=1:size(rawData, 1)
% Create a regular expression to detect and remove non-numeric prefixes and suffixes.
regexstr = '(?<prefix>.*?)(?<numbers>([-]*(\d+[\,]*)+[\.]{0,1}\d*[eEdD]{0,1}[-+]*\d*[i]{0,1})|([-]*(\d+[\,]*)*[\.]{1,1}\d+[eEdD]{0,1}[-+]*\d*[i]{0,1}))(?<suffix>.*)';
try
result = regexp(rawData(row), regexstr, 'names');
numbers = result.numbers;
% Detected commas in non-thousand locations.
invalidThousandsSeparator = false;
if numbers.contains(',')
thousandsRegExp = '^[-/+]*\d+?(\,\d{3})*\.{0,1}\d*$';
if isempty(regexp(numbers, thousandsRegExp, 'once'))
numbers = NaN;
invalidThousandsSeparator = true;
end
end
% Convert numeric text to numbers.
if ~invalidThousandsSeparator
numbers = textscan(char(strrep(numbers, ',', '')), '%f');
numericData(row, col) = numbers{1};
raw{row, col} = numbers{1};
end
catch
raw{row, col} = rawData{row};
end
end
end
% Convert the contents of columns with dates to MATLAB datetimes using the specified date format.
try
dates{1} = datetime(dataArray{1}, 'Format', 'HH:mm:ss', 'InputFormat', 'HH:mm:ss');
catch
try
% Handle dates surrounded by quotes
dataArray{1} = cellfun(@(x) x(2:end-1), dataArray{1}, 'UniformOutput', false);
dates{1} = datetime(dataArray{1}, 'Format', 'HH:mm:ss', 'InputFormat', 'HH:mm:ss');
catch
dates{1} = repmat(datetime([NaN NaN NaN]), size(dataArray{1}));
end
end
dates = dates(:,1);
%% Split data into numeric and string columns.
rawNumericColumns = raw(:, [3,4,5,6,7,8,9]);
rawStringColumns = string(raw(:, 2));
%% Make sure any text containing <undefined> is properly converted to an <undefined> categorical
idx = (rawStringColumns(:, 1) == "<undefined>");
rawStringColumns(idx, 1) = "";
%% Create output variable
sample = table;
sample.VarName1 = dates{:, 1};
sample.VarName2 = categorical(rawStringColumns(:, 1));
sample.VarName3 = cell2mat(rawNumericColumns(:, 1));
sample.VarName4 = cell2mat(rawNumericColumns(:, 2));
sample.VarName5 = cell2mat(rawNumericColumns(:, 3));
sample.VarName6 = cell2mat(rawNumericColumns(:, 4));
sample.VarName7 = cell2mat(rawNumericColumns(:, 5));
sample.VarName8 = cell2mat(rawNumericColumns(:, 6));
sample.VarName9 = cell2mat(rawNumericColumns(:, 7));
% For code requiring serial dates (datenum) instead of datetime, uncomment the following line(s) below to return the imported dates as datenum(s).
% sample.VarName1=datenum(sample.VarName1);
end
  댓글 수: 2
Mathieu NOE
Mathieu NOE 2021년 9월 22일
my pleasure

댓글을 달려면 로그인하십시오.

추가 답변(1개)

Jeremy Hughes
Jeremy Hughes 2021년 9월 21일
편집: Jeremy Hughes 2021년 9월 21일
Based on the file attached, I suggest treating this as a fixed width file.
opts = detectImportOptions("sample.txt","FileType","fixedwidth")
opts =
FixedWidthImportOptions with properties: Format Properties: Whitespace: '\b\t ' LineEnding: {'\n' '\r' '\r\n'} CommentStyle: {} EmptyLineRule: 'skip' Encoding: 'UTF-8' Replacement Properties: MissingRule: 'fill' ImportErrorRule: 'fill' ExtraColumnsRule: 'addvars' PartialFieldRule: 'keep' Variable Import Properties: Set types by name using setvartype VariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more} VariableTypes: {'duration', 'char', 'double' ... and 6 more} VariableWidths: [8 3 5 8 5 5 5 5 4] SelectedVariableNames: {'Var1', 'Var2', 'Var3' ... and 6 more} VariableOptions: Show all 9 VariableOptions Access VariableOptions sub-properties using setvaropts/getvaropts VariableNamingRule: 'modify' Location Properties: DataLines: [1 Inf] VariableNamesLine: 0 RowNamesColumn: 0 VariableUnitsLine: 0 VariableDescriptionsLine: 0 To display a preview of the table, use preview
readtable("sample.txt",opts)
ans = 16×9 table
Var1 Var2 Var3 Var4 Var5 Var6 Var7 Var8 Var9 ________ __________ _____ _____ ________ ____ ____ ____ ____ 14:57:00 {0×0 char} 0 0 {'01**'} 4843 63 60 24 14:58:00 {0×0 char} 0 0 {'01**'} 4843 59 60 24 14:59:00 {0×0 char} 0 0 {'01**'} 4843 61 60 24 15:00:00 {0×0 char} 0 0 {'01**'} 4843 63 60 24 15:01:00 {0×0 char} 0 0 {'01**'} 4843 63 60 24 15:02:00 {0×0 char} 0 0 {'01**'} 4843 56 60 24 15:03:00 {0×0 char} 0 0 {'01**'} 4843 56 60 24 15:04:00 {0×0 char} 0 0 {'01**'} 4843 63 60 24 15:05:00 {0×0 char} 0 0 {'01**'} 4843 67 60 24 15:06:00 {'R-' } 0.246 0.004 {'01**'} 4843 123 61 24 15:07:00 {'R-' } 0.433 0.011 {'01**'} 4845 132 61 24 15:08:00 {'R-' } 0.231 0.015 {'01**'} 4845 104 61 24 15:09:00 {'R-' } 0.168 0.017 {'01**'} 4843 94 61 24 15:10:00 {'R-' } 0.168 0.02 {'01**'} 4843 94 61 24 15:11:00 {'R-' } 0.007 0.02 {'01**'} 4843 76 61 24 15:12:00 {0×0 char} 0 0.02 {'01**'} 4843 76 61 24
  댓글 수: 1
Marzuki Marzuki
Marzuki Marzuki 2021년 9월 22일
Thank you Jeremy Hughes

댓글을 달려면 로그인하십시오.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by