How to get multiline matches using regular expressions (regexp)
조회 수: 6 (최근 30일)
이전 댓글 표시
I'm trying to filter a large data file for some particular information. This is how a part of the data file looks like.
text = ['<node id="309134964" lat="48.0685823" lon="11.6592565" version="4" timestamp="2015-02-16T12:52:33Z" changeset="28884856" uid="8748" user="ToniE">'...
'<tag k="power" v="substation"/>'...
'<tag k="source" v="survey"/>'...
'<tag k="operator" v="Energieversorgung Ottobrunn"/>'...
'</node>'...
'<node id="309202573" lat="49.0064035" lon="9.1332687" version="6" timestamp="2015-08-09T09:24:34Z" changeset="33215175" uid="2672520" user="Stingray80"/>'...
'<node id="309209816" lat="47.9344289" lon="11.1041431" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309209818" lat="47.9335507" lon="11.103726" version="2" timestamp="2014-07-30T20:48:19Z" changeset="24451882" uid="12096" user="HCX Biker"/>'...
'<node id="309209819" lat="47.9333751" lon="11.1045838" version="2" timestamp="2011-03-24T18:45:17Z" changeset="7658567" uid="313675" user="alphax"/>'...
'<node id="309209822" lat="47.9339823" lon="11.1047609" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309209824" lat="47.9342688" lon="11.1048045" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309245115" lat="48.074924" lon="11.6531406" version="6" timestamp="2014-02-03T21:13:35Z" changeset="20361115" uid="8748" user="ToniE">'...
'<tag k="power" v="substation"/>'...
'<tag k="source" v="survey"/>'...
'<tag k="operator" v="Energieversorgung Ottobunn"/>'...
'</node>'...
'<node id="309424891" lat="52.5676698" lon="13.0440382" version="4" timestamp="2015-03-08T19:18:44Z" changeset="29337113" uid="2149159" user="bergaufsee">'...
'<tag k="power" v="substation"/>'...
'</node>'];
I would like to filter only those nodes which have various tags following them.
i.e. My three outputs/matches should be these
output1:
<node id="309134964" lat="48.0685823" lon="11.6592565" version="4" timestamp="2015-02-16T12:52:33Z" changeset="28884856" uid="8748" user="ToniE">'...
'<tag k="power" v="substation"/>'...
'<tag k="source" v="survey"/>'...
'<tag k="operator" v="Energieversorgung Ottobrunn"/>
output2:
<node id="309245115" lat="48.074924" lon="11.6531406" version="6" timestamp="2014-02-03T21:13:35Z" changeset="20361115" uid="8748" user="ToniE">'...
'<tag k="power" v="substation"/>'...
'<tag k="source" v="survey"/>'...
'<tag k="operator" v="Energieversorgung Ottobunn"/>
output3:
<node id="309424891" lat="52.5676698" lon="13.0440382" version="4" timestamp="2015-03-08T19:18:44Z" changeset="29337113" uid="2149159" user="bergaufsee">'...
'<tag k="power" v="substation"/>
I have used this regular expression for the match
substation_nodes = regexp(text, '(<node.*?\">(.|\n)*?)(?=<\/node>)','match');
In Matlab when I run this code I have a problem getting the above outputs. The first and third outputs are as required but my second output looks like this instead
output 2:
<node id="309202573" lat="49.0064035" lon="9.1332687" version="6" timestamp="2015-08-09T09:24:34Z" changeset="33215175" uid="2672520" user="Stingray80"/>'...
'<node id="309209816" lat="47.9344289" lon="11.1041431" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309209818" lat="47.9335507" lon="11.103726" version="2" timestamp="2014-07-30T20:48:19Z" changeset="24451882" uid="12096" user="HCX Biker"/>'...
'<node id="309209819" lat="47.9333751" lon="11.1045838" version="2" timestamp="2011-03-24T18:45:17Z" changeset="7658567" uid="313675" user="alphax"/>'...
'<node id="309209822" lat="47.9339823" lon="11.1047609" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309209824" lat="47.9342688" lon="11.1048045" version="1" timestamp="2008-11-01T19:21:22Z" changeset="651519" uid="39150" user="account_deleted_1011"/>'...
'<node id="309245115" lat="48.074924" lon="11.6531406" version="6" timestamp="2014-02-03T21:13:35Z" changeset="20361115" uid="8748" user="ToniE">'...
'<tag k="power" v="substation"/>'...
'<tag k="source" v="survey"/>'...
'<tag k="operator" v="Energieversorgung Ottobunn"/>
There is an overlapping of the previous nodes in my output when I only need the last node id (i.e. node id=309245115). I have noticed that when I use regex101.com or regexr.com it works fine as long as I use the /g global modifier. I understand that /g expression flag retains the index of the last match, allowing iterative searches. Is this possible in Matlab? Do I have to use g-modifier explicitly in Matlab? What is the equivalent expression flag in Matlab
Or is the problem not even related to global modifier? I am clueless regarding the source of the problem
Could someone please help me out here. I am new to Matlab and regular expressions and not able to figure this out!! Thanks in advance
댓글 수: 0
답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Characters and Strings에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!