How can I sort my data from regexp?
조회 수: 2 (최근 30일)
이전 댓글 표시
Hi I have a problem when using regexp with this command.
RVRtmp=regexp(TXTmod,'R\d\d\w\/\w*\d\d\d\D\>','match')
The output cell is mostly empty and looks like this:
[]
[]
[]
[]
[]
[]
[]
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
<1x4 cell>
[]
I would like to obtain the information in the [1x4 cells]. The information inside the cells look like this:
'R01L/P1500N' 'R19R/0900VP1500N' 'R01R/0800V1400D' 'R19L/1000N'
Here I would like to obtain the information 'R01L' as a variable or string and the corresponding value of '1500' as a vector or cell. I'm having a bit of trouble to extract the data as the empty cells is not working with my command:
RVR1=regexp(RVRtmp{1072}{1},'\d{4}','match')
I would like to arrange the data like this:
R01L =
NaN
NaN
1500
2000
1000
500
700
NaN
TXTmod looks like this:
'METAR ESSA 200901220720Z 03003KT 1500 R01L/P1500N R19R/P1500N R01R/0700N R19L/0800V1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220750Z 04003KT 020V090 1500 R01L/P1500N R19R/P1500N R01R/0800V1000N R19L/0900N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220820Z 02003KT 320V100 1000 R01L/P1500N R19R/0900VP1500N R01R/0800V1400D R19L/1000N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 2000'
'METAR ESSA 200901220850Z 06004KT 0900 R01L/P1500N R19R/1100V1500U R01R/1000V1400N R19L/1200N FZFG VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 0700'
'METAR ESSA 200901220920Z 04003KT 360V060 1000 R01L/P1500N R19R/1200U R01R/0700N R19L/1000VP1500N BR VV002 M00/M00 Q1006 01710173 08710164 51710170 TEMPO 1500'
'METAR ESSA 200901220950Z 04004KT 1500 BR VV002 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221020Z 01003KT 1700 BR BKN002 BKN017 M00/M00 Q1005 01710173 08710164 51710170 NOSIG'
'METAR ESSA 200901221050Z 35004KT 2500 BKN002 BKN019 00/00 Q1004 01710173 08710164 51710170 NOSIG'
댓글 수: 0
채택된 답변
Guillaume
2016년 10월 14일
편집: Guillaume
2016년 10월 14일
There is no real need for the intermediate regexp, you can get it all with just one regular expression:
tokens = regexp(TXTmod, '(R\d\d\w)/\w*(\d\d\d\d)\D\>', 'tokens'); %You were missing a \d in your regexp (which was captured by the \w* so it didn't matter)
Or more efficient (but a bit longer):
tokens = regexp(TXTmod, '\<(R\d{2}[A-Z])/(?:(?:\d{4})?[A-Z]+)?(\d{4})[A-Z]\>', 'tokens')
Note the inefficiency in your original expression: The \w*\d\d\d in your first regular expression is going to cause a lot of backtracking by the regular expression engine because the \w* is always going to match the next three \d. Because * is greedy, at first the engine is going to match the three digits with \w* and find then that it can't match 3 digits after. So it's going to backtrack one digit, match the first two digits with \w*, the 3rd digit with \d and find that it still can't find a match for the next two \d. it will have to backtrack two more times until \w* only match the letters and the three \d match a digit.
The new regular expression matches a optional group of 4 digits followed by 1 or more letter and then capture the final groups of 4 digits before the last letter. I've also added a start of word match: \<.
Other note: To rearrange the tokens of each string into a two column cell array:
cellfun(@(t) vertcat(t{:}), tokens, 'UniformOutput', false)
댓글 수: 0
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Numeric Types에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!