Import text files with character and numeric data
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
2 개 추천
Hello, I have the following text file (please find attached). I want to import it into matlab and I need only numeric data. The text is not required. I tried this using the import function in matlab. The problem I have is the number of columns are not known and keeps on changing. So the generated code is not working when the number of columns change. How can I import the data with any number of columns and rows. Moreover, the data file I attached is a smaller version. The number of rows in original data file goes over 3 million. How can I import the text file of this type as fast as possible ?
Thank you.
채택된 답변
Azzi Abdelmalek
2015년 7월 16일
s=importdata('file.txt')
data=s.data
text=s.textdata
colheaders=s.colheaders
댓글 수: 9
Thanks for the response. How can I extract the numbers associated with the result "text".
I doubt that it can work this way. If you need to extract the array of numbers only, you can do it this way:
fId = fopen( 'Raw.txt', 'r' ) ;
data = textscan( fId, '%f %f %f', 'HeaderLines', 22 ) ;
fclose( fId ) ;
Then if you prefer to deal with a numeric array instead of a cell array of columns:
data = horzcat( data{:} ) ;
Now if you also need the numbers associated with the parameters from the header, one way to do it is to use a regular expression:
% - Similar to what we did above, but we get the file content in
% a string buffer.
content = fileread( 'Raw.txt' ) ;
data = textscan( content, '%f %f %f', 'HeaderLines', 22 ) ;
data = horzcat( data{:} ) ;
% - Now we process the buffer with REGEXP.
tokens = regexp( content, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
With that you get:
>> data
data =
1.0e+04 *
0.0000 -0.8247 -0.9921
0.0000 -0.7204 -1.0678
0.0000 -0.8800 -1.2426
0.0000 -0.7581 -1.0489
0.0000 -0.7281 -1.1200
0.0001 -0.6932 -1.0733
0.0001 -0.6615 -0.9821
0.0001 -0.7036 -1.0141
0.0001 -0.6607 -1.1401
0.0001 -0.5457 -0.9972
0.0001 -0.6714 -0.9440
0.0001 -0.9144 -1.0676
>> parameters
parameters =
normal: 6.1000
dow: 1
Num: 209
ionconc: 1
Desnoise: 100
Time: 0.0080
hotmol: 0
dex: 1
elay: 11250
Des: 16
Max: 1500
Offset: 0
Mode: 1
Note that you can use IMPORTDATA, but you have to specifiy the delimiter (a tab in your case) and the number of header lines:
conent = importdata( 'Raw.txt', '\t', 22 ) ;
>> content
content =
data: [12x3 double]
textdata: {22x3 cell}
colheaders: {'X' 'Wide' 'Resolution'}
Hope it helps!
Thanks for the response. The problem is my header lines are not fixed. They keep on changing. How to make it automated.
Can you provide a few files with different headers?
If you always had the 'Resolution' column header though, you could do something like:
% - Read file content.
content = fileread( 'Raw.txt' ) ;
% - Split on 'Resolution' column header.
content = strsplit( content, 'Resolution' ) ;
% - Parse array.
data = textscan( content{2}, '%f %f %f' ) ;
data = horzcat( data{:} ) ;
% - Parse parameters.
tokens = regexp( content{1}, '(\w+)=(\S+)', 'tokens' ) ;
for tId = 1 : numel( tokens )
parameters.(tokens{tId}{1}) = str2double( tokens{tId}{2} ) ;
end
Ok, the code in my comment above (with the split) should work. I almost never use IMPORTDATA to be honest, because I don't know what it does internally (see note *) and I never know whether it will work later if my format evolves a little. So I always develop parsers specifically for what I need to do, and I implement some flexibility if/when needed.
Note *: you can see how IMPORTDATA was implemented by typing
open importdata
in the command window. But again, you can reverse engineer this version to understand a bit better, but it is difficult to know how it will evolve in the future.
Thanks. I got it. How can I specify the numbers in parameters as input in my next line of the program.
The file number? You can build a string using SPRINTF, for example
for fileId = 1 : 10
filename = sprintf( 'Raw%d.txt', fileId )
content = fileread( filename ) ;
...
end
But you can also use DIR to get e.g. all text files, whatever their name:
D = dir( '*.txt' ) ;
for fileId = 1 : length( D )
filename = D(fileId).name ;
content = fileread( filename ) ;
...
end
This would catch Raw.txt for example, which has no number.
I just re-read your comment and realized that I misunderstood. The variable parameters is a struct, a variable with fields:
>> class( parameters )
ans =
struct
Its fields can be dot-indexed. If you want to address/index the field elay for example, you do it this way:
>> parameters.elay
ans =
11250
This is a numeric field of type/class double:
>> class( parameters.elay )
ans =
double
so you can compute with it:
>> parameters.elay / 10
ans =
1125
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Text Files에 대해 자세히 알아보기
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
