Regexp expression to handle changing format

조회 수: 2 (최근 30일)
jimmy zubiate
jimmy zubiate 2022년 3월 6일
댓글: jimmy zubiate 2022년 3월 9일
%dummy data
% t,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501
% t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501
S=fileread(filename);
myexpression = ['(?<tvar>w*,'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\w*\.*\w*),'...
'(?<HNL>\w*),'...
'(?<codeTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*,'... % <== This line handles the first line of dummy data
'(?<caprTm>\w*\s*\d*\:*\d*\:*\d*\.*\d*\s*\d*,'... % <== This line handles the first line of dummy data
'(?<logAt>\w*\.*\w*']
parts = regexp(filtered,myexpression,'names')
The third and second to last variables (codeTm, caprTm) change formats within the data. How can I modify or add logic to accept 2 to 3 spaced values within the variable "codeTm" and 3 to 4 spaced values within variable "caprTm"???
2 spaced valued variable (000 00:00:00.00)
3 spaced valued variable (000 00 00:00:00.00) or (343 19:54:20.684 8)
4 spaced valued variable (21 343 19:54:20.684 8)
Thank you for the help. My apologies for making my expresion so complicated. Still learning the in's and out's for expression formats for regexp to read data.
  댓글 수: 2
Stephen23
Stephen23 2022년 3월 7일
It is not clear why you are using regular expressions for importing this data: READTABLE et al have options for handling missing field data. Having you considered using the inbuilt data importing functions?
jimmy zubiate
jimmy zubiate 2022년 3월 9일
In the process of learning Matlab. Persued regexp function to create a structure array where I could maneuver through the values to perform analysis needed.
What I'm thinking I should pursue is prep file to remove unwanted white space, headers and other non-useful data and import as a comma space delimited file. Then I can count items inside each variable, marked by spaces and then off to the next step.
Other option is pursue fgetl function and implement logic to read useful data gracefully. I'm attaching dummy test data for your viewing. Thanks.

댓글을 달려면 로그인하십시오.

답변 (1개)

Stephen23
Stephen23 2022년 3월 7일
편집: Stephen23 2022년 3월 7일
You can easily make a group optional or occur a specific number of times using any suitable quantifier, for example:
(..)? % zero or one time
(..)* % zero or more times
(..){2,4} % two to four times
etc.
However, rather than trying to match specific groups of characters I would use a simpler approach of matching sets of characters. I had to fix several other bugs in your regular expression to get this working, mostly missing backslashes and parentheses.
str = fileread('test.txt')
str =
't,00000000CIB0000004001,0.47,L,000 00:00:00.00,343 19:54:20.684 8,22.501 t,00000000CIB0000004001,0.47,L,000 00 00:00:00.00,21 343 19:54:20.684 8,22.501'
rgx = ['^\s*(?<tvar>\w*),'...
'(?<tmCodeRdr>\w*),'...
'(?<tmCodLvl>\d*\.?\d*),'...
'(?<HNL>\w*),'...
'(?<codeTm>[ :\w\.]*),'...
'(?<caprTm>[ :\w\.]*),'...
'(?<logAt>\d*\.?\d*)'];
parts = regexp(str,rgx,'names','lineanchors')
parts = 1×2 struct array with fields:
tvar tmCodeRdr tmCodLvl HNL codeTm caprTm logAt
parts.codeTm
ans = '000 00:00:00.00'
ans = '000 00 00:00:00.00'
But personally I would not try and reinvent the wheel for such a data file, READTABLE is much simpler:
tbl = readtable('test.txt','delimiter',',');
tbl.Properties.VariableNames = {'tvar','tmCodeRdr','tmCodLv','HNL','codeTm','caprTm','logAt'}
tbl = 2×7 table
tvar tmCodeRdr tmCodLv HNL codeTm caprTm logAt _____ _________________________ _______ _____ ______________________ _________________________ ______ {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00:00:00.00' } {'343 19:54:20.684 8' } 22.501 {'t'} {'00000000CIB0000004001'} 0.47 {'L'} {'000 00 00:00:00.00'} {'21 343 19:54:20.684 8'} 22.501
  댓글 수: 1
jimmy zubiate
jimmy zubiate 2022년 3월 9일
That should work. Let me try to implement on my side and see what I get. Thanks Stephen!

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by