Importing multiple datasets from one text file

조회 수: 5 (최근 30일)
Jeff Szkodzinski
Jeff Szkodzinski 2015년 4월 2일
댓글: dpb 2019년 9월 4일
Below is a text file that contains three separate data sets of varying columns and number of rows. I wanted to know if there is a way to read in each table as a separate structure so that I can convert each column to a variable for later calculation. Importdata only retrieves the first table and ignores everything that follows since the data is not of the same format. This text file came from the web in this configuration, and I want it so that the user does not have to manually copy each table, paste to a new .txt, then ingest each file separately with importdata. Is it worth writing extra code just to have all three tables consolidated in one file? Any suggestions are welcome.
Thanks,
ALT DIR SPD SHR TEMP DPT PRESS RH ABHUM DENSITY I/R V/S VPS PW
GEOMFT DEG KTS /SEC DEG C DEG C MBS PCT G/M3 G/M3 N KTS MBS MM
2372 259 18.1 .000 23.8 0.2 932.20 21 4.52 1090.86 270 673 6.19 0
2500 272 17.0 .054 23.4 -0.1 928.13 21 4.43 1087.43 269 672 6.06 0
3000 273 29.2 .041 20.5 -1.3 911.86 23 4.09 1079.29 265 669 5.55 1
3500 275 26.3 .010 19.3 -1.2 895.87 25 4.15 1064.64 262 667 5.60 1
4000 267 22.5 .017 17.9 -2.3 880.15 25 3.83 1051.01 257 666 5.14 2
4500 258 20.7 .013 16.4 -3.1 864.51 26 3.64 1037.75 253 664 4.86 3
5000 259 23.4 .009 15.0 -3.8 849.12 27 3.46 1024.46 249 662 4.61 3
5500 265 26.7 .014 13.6 -4.5 833.95 28 3.30 1011.15 245 661 4.36 4
6000 261 20.9 .021 13.1 -5.9 818.96 26 2.97 994.87 240 660 3.92 4
6500 258 20.2 .004 11.9 -6.0 804.23 28 2.96 981.21 237 659 3.89 5
7000 254 20.6 .005 11.8 -8.8 789.72 23 2.38 964.02 229 659 3.13 5
7500 278 21.1 .028 11.2 -14.5 775.42 15 1.52 949.19 221 658 1.99 5
8000 277 21.3 .001 10.5 -21.0 761.39 9 0.87 934.58 214 657 1.14 6
8500 271 20.9 .008 9.4 -22.2 747.54 9 0.79 921.19 210 655 1.03 6
9000 260 18.4 .015 8.4 -25.4 733.86 7 0.59 907.66 206 654 0.77 6
9500 250 15.7 .013 7.4 -27.3 720.44 6 0.50 894.29 202 653 0.65 6
10000 242 16.3 .008 6.7 -35.4 707.21 3 0.23 880.22 198 652 0.29 6
10500 238 18.0 .007 5.6 -30.8 694.24 5 0.36 867.41 195 651 0.46 6
11000 236 19.4 .005 4.5 -31.8 681.39 5 0.33 854.70 192 650 0.42 6
11500 249 19.9 .015 3.3 -36.5 668.69 3 0.21 842.65 189 648 0.26 6
12000 249 15.6 .015 2.2 -38.6 656.20 3 0.17 830.23 186 647 0.21 6
12500 244 16.2 .005 1.0 -30.9 643.91 7 0.36 817.95 185 646 0.46 6
13000 241 19.3 .011 -0.2 -26.1 631.82 12 0.57 806.04 183 644 0.72 6
13500 243 21.1 .006 -1.4 -23.9 619.87 16 0.70 794.23 181 643 0.88 6
14000 233 21.6 .013 -2.1 -26.8 608.15 13 0.54 781.38 178 642 0.68 6
14500 242 19.3 .013 -3.0 -25.9 596.57 15 0.59 768.94 175 641 0.73 6
15000 248 21.4 .010 -4.0 -27.5 585.18 14 0.51 757.22 172 640 0.63 7
16000 251 22.5 .003 -6.4 -27.4 562.98 17 0.52 734.92 167 637 0.64 7
17000 249 25.8 .006 -8.8 -29.3 541.35 17 0.44 713.04 162 634 0.54 7
18000 246 25.9 .003 -11.2 -30.7 520.46 18 0.39 691.92 157 631 0.47 7
19000 262 27.4 .013 -13.6 -34.0 500.12 16 0.28 671.20 151 628 0.34 7
20000 262 32.9 .009 -15.3 -48.3 480.41 4 0.06 649.02 145 626 0.07 7
21000 252 36.4 .012 -17.6 -43.8 461.40 8 0.10 628.91 141 623 0.12 7
22000 249 39.0 .005 -20.3 -43.9 442.92 10 0.10 610.17 137 620 0.12 7
23000 239 37.4 .012 -22.9 -43.6 424.99 13 0.11 591.56 133 617 0.13 7
24000 238 35.9 .003 -25.6 -41.0 407.63 22 0.15 573.54 129 613 0.17 7
25000 248 38.2 .011 -28.6 -39.5 390.81 34 0.17 556.55 125 610 0.19 7
26000 251 38.1 .004 -31.4 -40.5 374.47 40 0.16 539.62 121 606 0.17 7
27000 254 37.8 .003 -34.2 -46.3 358.72 28 0.08 522.92 117 603 0.09 7
28000 255 40.1 .004 -36.9 -50.3 343.38 23 0.05 506.30 113 599 0.06 7
29000 253 40.1 .002 -39.6 -46.2 328.51 49 0.09 489.95 110 596 0.09 7
30000 262 42.5 .012 -41.3 -55.7 314.23 19 0.03 472.12 105 594 0.03 7
31000 270 44.6 .011 -43.6 -56.2 300.40 23 0.03 455.87 102 591 0.03 7
32000 271 47.9 .006 -46.0 -54.5 287.00 37 0.03 440.13 98 588 0.04 7
33000 271 52.4 .008 -48.3 -56.0 274.19 40 0.03 424.80 95 585 0.03 7
34000 270 54.5 .004 -50.9 -59.0 261.77 37 0.02 410.29 92 581 0.02 7
35000 268 55.5 .004 -53.3 -61.2 249.69 37 0.01 395.66 88 578 0.02 7
36000 265 55.7 .005 -55.7 -63.8 238.17 35 0.01 381.60 85 575 0.01 7
37000 263 55.0 .003 -58.1 -66.8 227.02 31 0.01 367.75 82 572 0.01 7
38000 264 51.4 .006 -60.7 -69.2 216.27 31 0.01 354.67 79 568 0.00 7
39000 268 54.5 .008 -62.4 -70.8 205.94 31 0.00 340.47 76 566 0.00 7
40000 265 44.6 .017 -61.2 -72.9 196.04 19 0.00 322.22 72 568 0.00 7
41000 257 46.6 .011 -58.4 -74.7 186.74 10 0.00 302.88 67 571 0.00 7
42000 255 39.9 .012 -56.5 -80.7 177.95 3 0.00 286.14 64 574 0.00 7
43000 251 42.2 .006 -56.7 -83.3 169.65 2 0.00 273.03 61 573 0.00 7
44000 245 46.2 .010 -57.0 -83.5 161.69 2 0.00 260.60 58 573 0.00 7
45000 249 51.6 .011 -58.2 -84.4 154.13 2 0.00 249.84 56 571 0.00 7
TERMINATION 45718 GEOPFT 13935 GEOPM 147.9 MBS
TROPOPAUSE 38621 FEET 209.80 MB -62.4 C -70.7 C
MANDATORY LEVELS GEOPFT DIR KTS TEMP DPT PRESS RH
2592 273 21 22.0 -0.7 925.0 22
4967 259 23 15.0 -3.8 850.0 27
10262 249 16 6.1 -30.7 700.0 5
18971 262 27 -13.7 -34.0 500.0 16
24398 240 38 -27.0 -40.3 400.0 27
30958 270 44 -43.7 -56.3 300.0 23
34888 268 55 -53.3 -61.1 250.0 37
39483 266 47 -61.8 -71.9 200.0 24
45419 247 51 -58.8 -84.8 150.0 2
SIGNIFICANT LEVELS
GEOMFT DIR KTS TEMP DPT PRESS IR RH
2372 259 18 23.8 0.2 932.2 270 21
2390 272 11 25.0 0.5 931.7 269 20
2426 272 14 23.8 -0.5 930.5 268 20
2586 273 21 22.0 -0.7 925.3 268 22
5105 261 24 14.6 -3.7 845.9 249 28
5358 265 26 14.0 -3.7 838.2 247 29
5556 265 26 13.5 -4.6 832.3 245 28
5786 259 22 13.7 -5.4 825.3 242 26
6718 251 17 11.1 -6.7 797.9 235 28
6806 247 19 11.3 -6.5 795.3 234 28
6910 248 20 11.8 -7.6 792.3 232 25
7181 268 18 12.0 -14.6 784.5 222 14
7446 278 20 11.1 -14.5 777.0 221 15
7669 277 21 11.3 -18.1 770.7 217 11
8923 262 18 8.3 -24.0 736.0 207 8
9721 241 17 7.3 -32.1 714.6 200 4
9885 238 17 6.9 -35.3 710.3 198 3
12291 245 14 1.6 -38.9 649.1 184 3
12944 241 19 -0.1 -27.0 633.2 183 11
13692 243 23 -1.9 -22.4 615.4 181 19
14083 233 20 -2.2 -27.7 606.2 177 12
14166 235 20 -2.2 -27.7 604.3 176 12
15717 253 20 -6.0 -27.7 569.2 169 16
16233 255 22 -7.0 -26.7 557.9 166 19
16465 254 23 -7.5 -32.8 552.8 163 11
16660 253 23 -8.0 -28.7 548.6 164 17
18140 246 25 -11.3 -30.8 517.6 156 18
18477 252 26 -12.3 -31.7 510.7 154 18
19235 263 27 -14.3 -34.5 495.4 150 16
19437 269 25 -14.8 -35.6 491.4 149 15
19576 267 25 -15.2 -37.3 488.7 148 13
19691 264 27 -14.8 -42.9 486.4 147 7
19708 264 27 -14.9 -44.4 486.1 147 6
21380 251 37 -18.5 -44.5 454.3 139 8
23390 236 36 -23.9 -45.9 418.2 131 11
23802 237 35 -25.0 -42.3 411.1 129 18
25167 250 37 -29.1 -39.6 388.1 125 35
25230 249 37 -29.2 -39.2 387.0 124 37
25867 250 37 -31.1 -40.2 376.6 122 40
26472 254 38 -32.7 -45.3 367.0 119 27
27312 255 38 -35.2 -47.8 353.8 116 26
27632 256 38 -36.0 -49.9 348.9 115 22
28124 255 40 -37.2 -50.6 341.5 113 23
28568 253 40 -38.4 -50.0 334.9 111 28
29089 253 40 -39.9 -45.9 327.2 110 52
29119 253 40 -39.9 -45.8 326.8 109 53
29297 253 40 -40.5 -45.8 324.2 109 56
29754 261 42 -40.7 -49.4 317.7 107 38
38551 262 50 -62.3 -70.6 210.5 78 31
38621 262 50 -62.4 -70.7 209.8 77 31
38935 267 54 -62.5 -71.0 206.6 76 30
40463 258 45 -61.0 -73.9 191.7 70 16
41687 256 39 -56.8 -77.8 180.7 65 5
45858 247 53 -59.1 -85.0 147.9 54 2

채택된 답변

dpb
dpb 2015년 4월 2일
편집: dpb 2015년 4월 3일
textscan will do it fine with some effort...basically three separate calls with 'headerlines',2 and returning the array for each call as a separate variable would be my first try. If you're lucky the failure of the first '%f' format will leave the file pointer at the right location for the next and then again.
Presuming the number of columns is fixed but the number of rows is variable, write a format string for each as
fmt=repmat(1,N,'%f');
where N is 14, 7 and 8, respectively for the three sections. If you need to parse the numeric values from the header/trailer of the intermediate case, then that'll have to be done specifically for the format of the text, of course, instead of treating those lines also as headerlines.
ADDENDUM
OK, I pasted the text into a file. Ignoring the trailing data within the footer which can be parsed if desired, the block data can be read pretty easily...
>> N=[14,7,8]; % the number of columns for each section
>> H=[2,5,3]; % the number of header lines (note added in the trailer here
>> fid=fopen('jeff.txt');
>> for i=1:3
fmt=repmat('%f',1,N(i)); % build the format string for the number columns
a(i)=textscan(fid,fmt,'headerlines',H(i),'collectoutput',1); % read section
end
>> fid=fclose(fid);
>> a
a =
[57x14 double] [9x7 double] [53x8 double]
>>
  댓글 수: 3
kelian dascher-cousineau
kelian dascher-cousineau 2017년 4월 13일
Would there be a way to automatically identify the number of data blocks, the number of headers and columns within the text file?
dpb
dpb 2017년 4월 13일
Depends on what you mean by "automatic". There is no magic bullet that will return that information for any arbitrary file structure, no. importdata lets you look at a file manually which may be all you need once but just a one-liner for any file you choose from any source is pretty much an infinite problem space.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Konstantinos Sofos
Konstantinos Sofos 2015년 4월 2일
Hi,
I assume that dlmread will do the job that you need from the time that all of your data (as i see) are numerics.
M = dlmread(filename)
Regards
  댓글 수: 2
Chao Wu
Chao Wu 2018년 5월 27일
Well done, it works, thanks
dpb
dpb 2019년 9월 4일
If it is just written sequentially into the file, then, sure. You'll get back one array of the full size, though, so you'll have to know a priori how many records belong in each, or, if one column is some sort of a time stamp, process it to find beginning of next data section (presuming it starts over from zero or similar,not just clock time--of course, the latter might have larger gap between sections).
We really can't do more than guess without specifics of file in detail...

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Text Files에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by