How to match and take the part of the string between two specified characters

Question

Mekala balaji 2017년 10월 2일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/359318-how-to-match-and-take-the-part-of-the-string-between-two-specified-characters

편집: Cedric 2017년 10월 4일

Hi,

I have the text file, and read some items form the text file:

for start time: as in test file: Start Time : 2020-06-08 10:12:01.02.653-starting VNA: FindSlot3 startTime: 2020-06-08 10:12:01.02.65 I used the following command: startTime = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?=*-starting)', 'match', 'once' )) ; but I do not getting my required output: I use the following code:

% - Define output header.
 header = {'RainFallID', 'IINT', 'Rain Result', 'Start Time', 'Param1.pipe', ...
    '10 Un Para2.pipe', 'Verti 2 mixing.dis', 'Rate.alarm times'} ;
 nHeaderCols = numel( header ) ;
 % - Build listing sub-folders of main folder.
 D_main = dir( 'mainfolder' ) ;
 D_main = D_main(3:end) ;             % Eliminate "." and ".."
 % - Iterate through sub-folders and process.
 for dId = 1 : numel( D_main )
    % - Build listing files of sub-folder.
    D_sub = dir( fullfile( 'mainfolder', D_main(dId).name, '*.txt' )) ;
    nFiles = numel( D_sub ) ;
    % - Prealloc output cell array.
    data = cell( nFiles, nHeaderCols ) ;
    % - Iterate through files and process.
    for fId = 1 : nFiles
        % - Read input text file.
        inLocator = fullfile( 'mainfolder', D_main(dId).name, D_sub(fId).name ) ;
        content = fileread( inLocator ) ;
        % - Extract relevant data.
        rainfallId = str2double( regexp( content, '(?<=RainFallID\s+:\s*)\d+', 'match', 'once' )) ;
        iint       = regexp( content, '(?<=IINT\s+:\s*)\S+', 'match', 'once' ) ;
        rainResult = regexp( content, '(?<=Rain Result\s+:\s*)\S+', 'match', 'once' ) ;
        startTime  = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?=*-starting)', 'match', 'once' )) ;
        endTime  = strtrim( regexp( content, '(?<=End Time\s+:\s*).*?(?= -)', 'match', 'once' )) ;
        chamber=regexp( content, '(?<=chamber\s+:\s*)\S+', 'match', 'once' ) ;
      end
      % - Output to XLSX.
      outLocator = fullfile( 'outputfolder', sprintf( '%s.xlsx', D_main(dId).name )) ;
      fprintf( 'Output XLSX: %s ..\n', outLocator ) ;
      xlswrite( outLocator, [header; data] ) ;
   end

my desired output is:

댓글 수: 2
없음 표시없음 숨기기

Jan 2017년 10월 2일

편집: Jan 2017년 10월 2일

MATLAB Online에서 열기

"not getting my required output" is a weak description of what is going on. Please explain, which problem you have. It is inefficient to let the readers guess this detail.

match and take the part of the string between two specified characters

Which string, what specified characters, where does this occur in the code?

It is confusing when you post a large block of code, which runs as wanted. Do not expect the readers in the forum to know, which line of code is concerned.

Mekala balaji 2017년 10월 2일

편집: Stephen23 2017년 10월 2일

image.png

Sir,

I want search "Start Time" and get its required data: 2020-06-08 10:12:01.02.653, similarly: for "Duration" and get its required data: 00:01:00 for "chamber" get its required data:1 (Slot12) (line =8)

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Cedric 2017년 10월 2일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/359318-how-to-match-and-take-the-part-of-the-string-between-two-specified-characters#answer_283943

편집: Cedric 2017년 10월 2일

MATLAB Online에서 열기

Replace the line that extracts the start time with:

 startTime  = strtrim( regexp( content, '(?<=Start Time\s+:\s*).*?(?= - )', 'match', 'once')) ;

and you can extract the duration with:

 duration = strtrim( regexp( content, '(?<=Duration\s+:\s*)\S+', 'match', 'once' )) ;

where the pattern (?<=Duration\s+:\s*)\S+ extracts

one or more non-white-spaces: \S+
preceded by: (?<=...) which is a look behind
the literal Duration followed by one or white-spaces \s+ followed by the literal : followede by zero or more white-spaces \s*

Finally, for chamber, you almost did it, it's good! The problem is that there can be some white-spaces in the middle of what you are trying to extract, and \S+ will break at the first white-space. Here there are several options for getting the end of the line. One would be based on anchoring the end of the line, and the other is based on picking all characters until it find a carriage return (\r) or a new line (\n) [which are not displayed in your text editor unless you ask for it, but we can use them]:

 chamber = strtrim( regexp( content, '(?<=chamber\s+:\s*)[^\r\n]+', 'match', 'once' )) ;

where [^..] defines a set of characters not to match, [^..]+ matches one or more of anything that is not in this set, and \r and \n code the carriage return and the new line. So the whole thing reads: match one or more of anything that is not a carriage return or a new line.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Cedric 2017년 10월 2일

편집: Cedric 2017년 10월 2일

MATLAB Online에서 열기

Yes you can, using the look-forward that you implemented (where I removed the *):

'(?<=Start Time\s+:\s*).*?(?=-starting)'

but then you make the pattern specific to files where you have '-starting' after the time/date, which I have not seen in your examples. If I look at what you provided, I see the two following possibilities:

 Start Time                      : 2013-06-08 10:12:01.02.653 - 
 Start Time                      : 2020-06-08 10:12:01.02.653 - VNA: FindSlot3

so I thought about what should break the match that is common, and this is the presence of a white space followed by a dash. And you can see that I made a mistake by the way, the pattern must be

'(?<=Start Time\s+:\s*).*?(?= -)'

and not

'(?<=Start Time\s+:\s*).*?(?= - )'

especially if there can be situations where what follows the date/time is ' -starting', with no white-space after the dash.

So you where correct in using a look around approach (look behind and look forward framing what you want match), but given all the possible cases of content in your files you were too specific.

Cedric 2017년 10월 4일

편집: Cedric 2017년 10월 4일

MATLAB Online에서 열기

Well, let me bring a correction actually, because I realize, looking a second time at your last example, that there is no white-space before the dash in

Start Time : 2020-06-08 10:12:01.02.653-starting VNA: FindSlot3

so one way to catch '-starting' or ' -' is to look for a dash followed by a character that is not a number. If these are all the cases present in all versions of your file, the following should work (to test):

'(?<=Start Time\s+:\s*).*?(?=-\D)'

where \D means "anything but a numeric digit" (if is the complement of \d).

댓글을 달려면 로그인하십시오.

Answer 2

Kian Azami 2017년 10월 2일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/359318-how-to-match-and-take-the-part-of-the-string-between-two-specified-characters#answer_283946

MATLAB Online에서 열기

You can use the following code to extract the lines relevant to 'Start Time' and the 'Duration' and then acquire the required data. The command 'textscan' helps to acquire data from text files.

clc
clear all
close all
fid = fopen('RainFallReport5.txt');
Start = textscan(fid,'%q%q%q%q%q%q%q%q',1,'HeaderLines',9);
Duration = textscan(fid,'%q%q%q%q%q%q%q%q',1,'HeaderLines',2);
Start_Time = ['Start Time:' strcat(Start{1,[4 5]})]
Duration = ['Duration:' strcat(Duration{1,3})]
fclose(fid);

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

How to match and take the part of the string between two specified characters

댓글 수: 2
없음 표시없음 숨기기

답변 (2개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

How to match and take the part of the string between two specified characters

댓글 수: 2 없음 표시없음 숨기기

답변 (2개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기