How to turn .txt file into a useful table.

조회 수: 108 (최근 30일)
John Jacoby
John Jacoby 2017년 8월 27일
댓글: Jeremy Hughes 2017년 8월 31일
This seems like it should be exceedingly simple, but I haven't found anything on here or anywhere else that addresses it. I have a text file delimited by periods that should be very easy to import using the readtable function, but it seems that readtable automatically sets everything to be character arrays. I've tried using format strings, but I get errors. I would include my code but it's simply one line, one fuction: readtable(filepath).
Trying to include a format string gets me:
"Unable to read the entire file. You may need to specify
a different format, delimiter, or number of header
lines.
Note: readtable detected the following parameters:
'HeaderLines', 0, 'ReadVariableNames', true
Error in redditAnalysis (line 4)
allData =
readtable('C:\Users\John\Desktop\ChildrensNeurobio\MATLABproject\redditPractice\all.txt',
'Delimiter', '.', 'Format', '%f%f%f%f%f%s');
"
Any idea how to get the columns I need into a useful numeric vector format?
EDIT: the first few lines of the file... rank.page.upvotes.comments.age.subreddit
1.1.40400.1283.3.OldSchoolCool
2.1.19200.906.4.funny
3.1.31800.1709.5.politics
4.1.40300.780.5.bestof
5.1.5844.1277.3.soccer
6.1.30200.256.5.aww

답변 (2개)

Sailesh Sidhwani
Sailesh Sidhwani 2017년 8월 30일
To achieve your workflow, along with the file you should all pass "File Import Options" to the readtable() functio. These options define how the file will be read in MATLAB. You can also set the variable names, variable types and delimiter in these import options. To know more about import options, check the documentation link below:
See the following steps to achieve your workflow. "abc.txt" is the subset of your file from your question.
opts = detectImportOptions('abc.txt')
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {','}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableTypes: {'char'}
SelectedVariableNames: {'rank_page_upvotes_comments_age_subreddit'}
VariableOptions: Show all 1 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now change the delimiter, variableNames and variableTypes as per your requirement.
opts.Delimiter = {'.'};
opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
opts =
DelimitedTextImportOptions with properties:
Format Properties:
Delimiter: {'.'}
Whitespace: '\b\t '
LineEnding: {'\n' '\r' '\r\n'}
CommentStyle: {}
ConsecutiveDelimitersRule: 'split'
LeadingDelimitersRule: 'keep'
EmptyLineRule: 'skip'
Encoding: 'windows-1252'
Replacement Properties:
MissingRule: 'fill'
ImportErrorRule: 'fill'
ExtraColumnsRule: 'addvars'
Variable Import Properties: Set types by name using setvartype
VariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableTypes: {'char', 'char', 'char' ... and 3 more}
SelectedVariableNames: {'rank', 'page', 'upvotes' ... and 3 more}
VariableOptions: Show all 6 VariableOptions
Access VariableOptions sub-properties using setvaropts/getvaropts
Now pass this "opts" as File Import Options to "readtable"
readtable('abc.txt',opts)
ans =
6×6 table
rank page upvotes comments age subreddit
____ ____ _______ ________ ___ _______________
'1' '1' '40400' '1283' '3' 'OldSchoolCool'
'2' '1' '19200' '906' '4' 'funny'
'3' '1' '31800' '1709' '5' 'politics'
'4' '1' '40300' '780' '5' 'bestof'
'5' '1' '5844' '1277' '3' 'soccer'
'6' '1' '30200' '256' '5' 'aww'
  댓글 수: 1
Jeremy Hughes
Jeremy Hughes 2017년 8월 31일
편집: Jeremy Hughes 2017년 8월 31일
you can also set the types with:
>> opts = setvartype(opts,1:5,'double');
See my full answer for a slightly better approach.

댓글을 달려면 로그인하십시오.


Jeremy Hughes
Jeremy Hughes 2017년 8월 31일
편집: Jeremy Hughes 2017년 8월 31일
Hi,
This is actually pretty simple:
>> opts = detectImportOptions('abc.txt','Delimiter','.')
>> opts.VariableNames= {'rank','page','upvotes','comments','age','subreddit'}
>> t = readtable('abc.txt',opts);
Without import options, readtable uses a slightly different reading method that scans for numbers and thus pulls the '.' (i.e. decimal point) along for the ride. Without the 'Delimiter' parameter, detectImportOptions will not choose '.' since it assumes the value will appear as a decimal separator.
Hope this helps,
Jeremy
  댓글 수: 1
Jeremy Hughes
Jeremy Hughes 2017년 8월 31일
And if the variable names are already there in the file, you might not need that second line.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by