The NOAA atmospheric data file with the entries defined by the header row comes in the following format:
USAF WBAN STATION NAME CTRY ST CALL LAT LON ELEV(M) BEGIN END
007018 99999 WXPOD 7018 +00.000 +000.000 +7018.0 20110309 20130730
007026 99999 WXPOD 7026 AF +00.000 +000.000 +7026.0 20120713 20170822
007070 99999 WXPOD 7070 AF +00.000 +000.000 +7070.0 20140923 20150926
008260 99999 WXPOD8270 +00.000 +000.000 +0000.0 20050101 20100920
Trying to extract a given column, which can be 'Elev' or 'USAF' or 'STATION NAME' etc. It is known apriori which column needs to be extracted, for example, column #1 (USAF). Running into problems because the 'STATION NAME' sometimes has a blank in between its alphanumeric code and sometimes it is just one code without any blanks. Also, other fields can be blank sometimes, for example CTRY. In the above 4 lines of the shortened input file, 'ST' and 'CALL' are empty, but they can be filled (and are usually alphabet codes).
Also,
(1). how to extract the USAF entries corresponding to only CTRY==AF ?
(2). how to extract all the rows with rowNumber=10000 to rowNumber=20000 (say).
Thanks.

 채택된 답변

per isakson
per isakson 2020년 5월 29일
편집: per isakson 2020년 5월 30일

0 개 추천

This is a fixed-width text file. The documentation includes a good description on how to read fixed-width text files.
See
Be careful to get the column widths right.
Also,
  1. how to extract the USAF entries corresponding to only CTRY==AF ?
  2. how to extract all the rows with rowNumber=10000 to rowNumber=20000 (say).
Use readtable() and read all rows (if that doesn't cause memory problems). The tools you need comes with table.
In response to comments
Since there are no delimiters in the data file, I find the message
Line 3 has 9 delimiters, while preceding lines have 8.
misleading. Even if one sequences of char(32) is counted as one delimiter the numbers 9 and 8 doesn't make sense.
I created the script below in three steps
  1. Create the obj, opts, with default values. Inspect opts
  2. Type opts.<tab> in the Command Window. (<tab> stands for tab-completion). I identified four properties, the default values of which were not meaningsful. I added statements to the script to assign values, which I found in the comments. (To save me some trouble in the future, I modified the names to become legal Matlab names.)
  3. Read the file with readtable().
%%
ffs = fullfile('d:\m\cssm\noaa1lineHeaderFirst15lines.txt');
opts = fixedWidthImportOptions; % default values
%%
opts.DataLines = [ 2, inf ];
opts.VariableNames = { 'USAF','WBAN','STATION_NAME','CTRY','ST' ...
, 'CALL','LAT','LON','ELEV_M_','BEGIN','END' };
opts.VariableTypes = { 'double','double','char','char','char','char' ...
, 'double','double','double','double','double' };
opts.VariableWidths = [ 7, 6, 30, 5, 3, 6, 8, 9, 8, 9, 9 ];
%%
tbl = readtable( ffs, opts );
No eror messages so far.
>> tbl
tbl =
14×11 table
USAF WBAN STATION_NAME CTRY ST CALL LAT LON ELEV_M_ BEGIN END
_____ _____ _____________________ ____ __ ______ ______ ______ _______ __________ __________
7018 99999 'WXPOD 7018' '' '' '' 0 0 7018 2.011e+07 2.0131e+07
7026 99999 'WXPOD 7026' 'AF' '' '' 0 0 7026 2.0121e+07 2.0171e+07
7070 99999 'WXPOD 7070' 'AF' '' '' 0 0 7070 2.0141e+07 2.0151e+07
8260 99999 'WXPOD8270' '' '' '' 0 0 0 2.005e+07 2.0101e+07
8268 99999 'WXPOD8278' 'AF' '' '' 32.95 65.567 1156.7 2.0101e+07 2.012e+07
8307 99999 'WXPOD 8318' 'AF' '' '' 0 0 8318 2.01e+07 2.01e+07
8411 99999 'XM20' '' '' '' NaN NaN NaN 2.016e+07 2.016e+07
...
Looks ok

댓글 수: 10

Already tried readtable. It was giving the following error:
Error using readtable (line 216)
Reading failed at line 3. All lines of a text file must have the same number of delimiters. Line 3 has 9 delimiters, while preceding lines have 8.
Note: readtable detected the following parameters:
'Delimiter', '\t ', 'MultipleDelimsAsOne', true, 'HeaderLines', 1, 'ReadVariableNames', false, 'Format', '%f%f%q%f%f%f%f%f%f'
Error in noaaExtractCol (line 3)
data=readtable('noaaFile.txt');
I thought this was because of blank spaces.
For the readtable options, I had used the following :
DataStartLine = 2;
NumVariables = 11;
VariableNames = {'USAF','WBAN','STATION NAME','CTRY','ST',...
'CALL','LAT','LON','ELEV(M)','BEGIN','END'};
VariableWidths = [ 7, 5, 30, 5, 3, 5, 8, 9, 8, 9, 9 ] ;
DataType = {'double','double','char','char','char','char',...
'double','double','double','double','double'};
opts = fixedWidthImportOptions('NumVariables',NumVariables,...
'DataLines',DataStartLine,...
'VariableNames',VariableNames,...
'VariableWidths',VariableWidths,...
'VariableTypes',DataType);
noaaTable = readtable(filename,opts)
which had given the error:
Error using matlab.io.text.FixedWidthImportOptions (line 46)
Expected a cell array of valid variable names.
Error in fixedWidthImportOptions (line 37)
opts = matlab.io.text.FixedWidthImportOptions(varargin{:});
Error in noaaExtract (line 13)
opts = fixedWidthImportOptions('NumVariables',NumVariables,...
I thought this was either due to a space in the variable 'Station Name', or blank.
Impossible for me to know for sure the format of the file. Do you know that it's not a fixed-width text file?
How did you use readtable()? You have obviously not created a FixedWidthImportOptions object using either the fixedWidthImportOptions function or the detectImportOptions function
"I thought this was because of blank spaces." Not likely, since the error message says it's because
Line 3 has 9 delimiters, while preceding lines have 8.
Proposal: upload a sample of the data file. Use the paper clip icon.
I think the VariableWidths are
7, 6, 30, 5, 3, 6, 8, 9, 8, 9, 9
b
b 2020년 5월 29일
Yes, the variable widths are 7,6,30,5,3,6,... instead of 7,5,30,5,3,5,... But it made no difference in the error with fixedWidthImportOptions. Error remained the same.
Line 3 has 9 delimiters, while preceding lines have 8.
The reason I thought it is because of the blank space is because the only difference between the data in row 1 and data in row 2 is the blank in the CTRY column of row 1 (whereas in row 2, it is 'AF').
Cris LaPierre
Cris LaPierre 2020년 5월 29일
편집: Cris LaPierre 2020년 5월 29일
Why not use the Import Tool to figure out how to load the data? I will admit the code might not be as readable as what could be done by hand, but it allows you to set up the properties interactively, and then create an import function, which can then be used whenever you need.
A couple videos you might find helpful
  1. How to use the import tool
  2. Generating and reusing code
b
b 2020년 5월 30일
Thanks, but importData doesn't capture the variables properly. I have attached the jpeg figure of how it has broken down the "Station Name" into 4 columns, and merged other variables into one column too.
per isakson
per isakson 2020년 5월 30일
See my answer, I've added a script that reads your sample data file.
b
b 2020년 5월 30일
Works perfect.
Some people on this site (Matlabcentral Answers) are extremely helpful. How can their help be ever repaid ?

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

b
b 2020년 5월 29일
편집: b 2020년 5월 29일

0 개 추천

Thanks for suggesting the paper clip icon.
I have attached the file (short one), but it contains almost everything the rest of the (very) big file contains.
The errors are while using this short test file as an input.

카테고리

도움말 센터File Exchange에서 Large Files and Big Data에 대해 자세히 알아보기

질문:

b
b
2020년 5월 29일

댓글:

b
b
2020년 5월 30일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by