필터 지우기
필터 지우기

Importing line-by-line from a text file

조회 수: 6 (최근 30일)
adrooney
adrooney 2015년 4월 2일
편집: per isakson 2015년 4월 4일
Hi I have a problem choosing what kind of tool to import data from a text file. The data looks something like this although the lines are like 4000 and
35, 0401-BA1,1,6,8,00000,0,01,053,50530,'Channel ',,
35, 0401-CB1,1,6,8,00000,0,01,051,50510,'Spillway ',,
35, 0401-EB1,1,3,2,00052,0,01,051,50510,'RioGrande ',,
35, 0401-FS1,1,6,8,00203,0,01,011,27340,'Spillway ',,
35, 0401-MB1,1,6,8,00000,0,01,037,0 ,'Canal ',,
I used the fscanf , textscan but I don't understand how to tackle the spaces. for the second space is it like %15c ?? Also How do I switch to another line should I use /n? Please help me here guys. Thank you
  댓글 수: 1
Star Strider
Star Strider 2015년 4월 3일
Attaching the text file (or a representative part of it) would be nice. (Use the ‘paperclip’ icon.)

댓글을 달려면 로그인하십시오.

채택된 답변

per isakson
per isakson 2015년 4월 3일
편집: per isakson 2015년 4월 4일
Hint
str = '35, 0401-BA1,1,6,8,00000,0,01,053,50530,''Channel '',,';
fmt = '%d%s%d%d%d%d%d%d%d%d%s%s';
cac = textscan( str, fmt ...
, 'Delimiter' , ',' ...
, 'CollectOutput', true );
  • " how to tackle the spaces" &nbsp depends on how you want the spaces in the resulting string variable values
  • "line-by-line" &nbsp what exactly do you mean?
&nbsp
Addendum I
This illustrates an approach. However, due to various mismatches between the header row and the data rows the result is not useful.
sas = cssm()
returns
1x4000 struct array with fields:
STATE_CODE_001
STRUCTURE_NUMBER_008
RECORD_TYPE_005A
ROUTE_PREFIX_005B
SERVICE_LEVEL_005C
ROUTE_NUMBER_005D
DIRECTION_005E
HIGHWAY_DISTRICT_002
COUNTY_CODE_003
PLACE_CODE_004
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
col_head_list = parse_header( str );
sas = repmat( cell2struct( repmat({''},1,112), col_head_list(1:112), 2 ), 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf(1:112), col_head_list(1:112), 2 );
end
fclose( fid );
end
function cac = parse_header( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',' );
cac = strtrim( cac );
end
&nbsp
Addendum II
I was fooled by the function, strsplit, which as default treats multiple delimiters as one. In my world that only applies to space. Anyhow, now I have fixed the function cssm.
Read the data to a structure
>> sas=cssm;
find all bridges, which have STATUS_WITH_10YR_RULE equal to '2'.
>> is_sw10yrEq2 = strcmp('2',{sas.STATUS_WITH_10YR_RULE});
>> sum(is_sw10yrEq2)
ans =
324
there are 324 of them. List their building year
>> {sas(is_sw10yrEq2).YEAR_BUILT_027}
ans =
Columns 1 through 9
'1915' '1936' '1928' '1931' '1920' '1951' '1933' '1939' '1939'
Columns 10 through 18
'1924' '1931' '1931' '1931' '1932' '1932' '1936' '1937' '1936'
....
where
function sas = cssm
ffs = 'c:\m\cssm\NM14.txt';
fid = fopen( ffs );
str = fgetl( fid );
colhead_list = parse_header( str );
n_colhead = length( colhead_list );
sas = repmat( cell2struct ...
( repmat({''},1,n_colhead), colhead_list, 2 ) ...
, 1,4000 );
jj = 0;
while not( feof(fid) )
str = fgetl( fid );
buf = parse_data_row( str );
jj = jj + 1;
sas(1,jj) = cell2struct( buf, colhead_list, 2 );
end
fclose( fid );
sas(1,jj+1:end) = [];
end
function cac = parse_header( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
function cac = parse_data_row( str )
cac = strsplit( str, ',', 'CollapseDelimiters',false );
cac = strtrim( cac );
end
&nbsp
Addendum III
Caveat: I hardly used the new (couple of years old), data type table. Nevertheless, table is targeted at problems like yours and I think you should
  • use Import Data interactively to read from a file of format B (one header line, comma separated, varying width) to a table. Make sure to uncheck Multiple Delimiters as One and select table. I've tested; it worked fine.
  • spend time with the documentation of table instead of fighting file formats.
&nbsp
Addendum IV
It is straightforward to read the file of format B (one header line, comma separated, varying width) with textscan
>> cac = cssm
cac =
{3951x134 cell}
where
function cac = cssm
ffs = 'c:\m\cssm\NM14.txt'; % one header line, comma separated, varying width
fid = fopen( ffs );
str = fgetl( fid );
chl = strtrim( strsplit( str, ',', 'CollapseDelimiters',false ) );
len = length( chl );
frm = repmat( '%s', 1, len );
cac = textscan( fid, frm, 'CollectOutput',true );
fclose( fid );
end
  댓글 수: 7
adrooney
adrooney 2015년 4월 3일
The C option is much more easy. It has got fixed widths. I tried to uses textscan but not successful. Could you give your ideas on the C option. I am attaching the field widths in an xls sheet
per isakson
per isakson 2015년 4월 3일
편집: per isakson 2015년 4월 4일
"[...] ideas on the C option" &nbsp With Matlab, fixed widths should be avoided and when one cannot:
cac = textscan( fid, '%w1s%w2s%w3s%w4s%w5s...', .... );
where w1,w2,w3, etc. are the widths of the columns, e.g. '%8s%12s%30s'. Convert the "numeric columns" to numerical variables in a second step.

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Import and Export에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by