Help with Regular Expressions
조회 수: 3 (최근 30일)
이전 댓글 표시
Hi Everyone! I am trying to parse messages which have a repeating structure (e.g. GPS NMEA messages), but the number of repeats of the structure is not immediately known. Let me illustrate with an example, take the following made-up message as an example:
MSGID,3,<data1a>,<data1b>,<data1c>,<data2a>,<data2b>,<data2c>,<data3a>,<data3b>,<data3c>
With this format I have a message identifier, followed by the number of data sets in the message (3, in this case), followed by the three sets of data. These three sets of data always come in sets of three, but the number of sets of three are unknown a priori and vary with each unique message. Is there a way to parse this using regular expressions? I.e. is there a way, using regexp, to incorporate the second field which tells me the number of "sets of three" I'll have, all within my expression in the regexp function call?
Thank you in advance!
Chris
UPDATE: EDIT: Everyone, Thank you for your responses thus far. I want to be a bit more clear for everyone as to what I am trying to do. I am trying to parse a NMEA message, specifically the GPGSV message. I'd like to parse the whole thing using regular expressions if possible. An example of the GPGSV message is given below:
$GPGSV,3,1,12,06,73,157,45,26,54,268,42,17,48,037,43,24,35,302,40*7B
$GPGSV,3,2,12,02,35,205,41,28,35,103,43,20,16,147,34,12,12,309,*7E
$GPGSV,3,3,12,03,10,057,33,13,03,215,36,15,01,250,32,30,00,157,24*7D
In this example, we have:
$GPGSV - message ID
3 - Number of GPGSV pages
1 - Page number of this message
12 - Number of data sets total (across all three messages).
The number 12 is interesting, it is saying that there are 12 total data sets across three messages, meaning there are 4 data sets in each message. I want to point out that each message does not specifically call out how many data sets it contains, you have to examine both the total number of data sets, as well as the current page number to figure out how many data sets are in the page.
After the number 12, we being the data sets:
06 - Satellite ID number
73 - Elevation angle of the satellite
157 - Azimuth angle of the satellite
45 - SNR of the satellite
26 - Satellite ID number
... etc (in sets of four numbers)
I am trying to find a way to parse this message using regular expressions. These GPGSV messages tend to come in sets of three, an example:
$GPGSV,3,1,11,05,65,062,45,29,59,331,44,25,51,241,43,12,35,190,41*75
$GPGSV,3,2,11,02,34,059,43,21,19,271,36,13,17,126,37,15,11,164,38*75
$GPGSV,3,3,11,10,07,041,34,20,05,043,33,18,00,218,00*4F
So here, I know that I need to parse out 4 data sets from message page 1, 4 data sets from message page 2, but only three data sets from message page 3. I realize this is complex, but this is what I'm trying to parse in a dynamic way using regular expressions.
Thank you guys in advance!
Chris
댓글 수: 2
arun kumar
2015년 3월 27일
maybe you can also search for fourth position after < and use a counter to check if there is repetition of this number. if there is a repetition of this number then your counter value has to be increased..so it will check that '1' has come three times so your value is 3. this works if the data format is always the same
Stephen23
2015년 3월 27일
You might like to try some of the Regular Expression Helpers available on MATLAB File Exchange, such as my own submssion:
It lets you try different match expressions and shows regexp's outputs as you type.
채택된 답변
Guillaume
2015년 3월 27일
편집: Guillaume
2015년 3월 27일
It's certainly possible to capture the data set number to reuse later in the expression. In fact, it's even the example that's shown in the documentation of regexp under dynamic regular expressions:
'^(\d+)((??\\w{$1}))' determines how many characters to match by reading a digit at the beginning of the string.
I'm not clear on what exactly you're trying to extract from your message, though.
댓글 수: 3
Guillaume
2015년 3월 30일
편집: Guillaume
2015년 3월 30일
It checks whether the given message ID is already a key in the map. If it is, I retrieve the corresponding value. If it is not, then I add it to the map.
Note that the first argument to isKey must be a map object. Otherwise, you'll most likely get an undefined function for type xxx error.
추가 답변 (1개)
Stephen23
2015년 3월 27일
편집: Stephen23
2015년 3월 27일
I would solve this exactly the other way around: simply identify the groups of dataNa,dataNb,dataNc using a basic regexp pattern (you said they always come in threes), and then afterwards confirm that the number of groups found matches the given value, something like this pseudocode:
tkn = regexp(str,'(data1a),(data1b),(data1c)', 'tokens');
tot = regexp(str,'MSGID,(\d)', 'tokens')
assert(numel(tkn)==str2double(tot))
Or using the pseudo-data from the original question:
>> str = 'MSGID,3,<data1a>,<data1b>,<data1c>,<data2a>,<data2b>,<data2c>,<data3a>,<data3b>,<data3c>';
>> tkn = regexp(str,'<(\w+)>,<(\w+)>,<(\w+)>','tokens');
>> tkn = vertcat(tkn{:});
>> size(tkn,1)==sscanf(str,'MSGID,%f,')
ans =
1
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 String에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!