Can REGEXP or TEXTSCAN be used to split 2 distinct data sets from a single text file?

Question

Brad 2014년 2월 6일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/115155-can-regexp-or-textscan-be-used-to-split-2-distinct-data-sets-from-a-single-text-file

댓글: Brad 2014년 2월 7일

I’ve got several text files containing data blocks that look like this;

MSN_JET (0:31) Observation #1 Rx'd at:  (58560.000) Msg. Time:  (58561.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  State Time:            12:00:00.000   (58561.000)
  State Position:       -1100.0000, -5100.0000, 4100.0000
MSN_SENSUM (0:32) Observation #20 Rx'd at:  (58560.000) Msg. Time:  (58561.000)
  Send to SCS: yes   Rcv Date: 2010121   Synch: ffff   Test Mode: nominal
  Con: 10 (Mobil_Tran)  Length: 5678   Remote Num: 1   Number of Observations: 1
Type: 1 Track ID: 12345 Time Tag: 58563.00000000
     Band ID: 1   RAD ID:   11 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
MSN_JET (0:31) Observation #1 Rx'd at:  (58570.000) Msg. Time:  (58571.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  State Time:            12:00:00.000   (58571.000)
  State Position:       -1200.0000, -5200.0000, 4200.0000
MSN_SENSUM (0:32) Observation #20 Rx'd at:  (58570.000) Msg. Time:  (58571.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  Con: 10 (Mobil_Tran)  Length: 5678   Remote Num: 1   Number of Observations: 2
Type: 1 Track ID: 12345 Time Tag: 58573.00000000
   Band ID: 1   RAD ID:   25 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58575.00000000
   Band ID: 1   RAD ID:   6 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
MSN_SENSUM (0:32) Observation #30 Rx'd at:  (58580.000) Msg. Time:  (58581.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  Con: 10 (Mobil_Tran)  Length: 5678   Remote Num: 1   Number of Observations: 3
Type: 1 Track ID: 12345 Time Tag: 58583.00000000
   Band ID: 1   RAD ID:   3 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58585.00000000
   Band ID: 1   RAD ID:   14 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58587.00000000
   Band ID: 1   RAD ID:   33 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
MSN_SENSUM (0:32) Observation #20 Rx'd at:  (58590.000) Msg. Time:  (58591.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  Con: 10 (Mobil_Tran)  Length: 5678   Remote Num: 1   Number of Observations: 4
Type: 1 Track ID: 12345 Time Tag: 58593.00000000
   Band ID: 1   RAD ID:   7 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58595.00000000
   Band ID: 1   RAD ID:   8 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58597.00000000
   Band ID: 1   RAD ID:   20 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
Type: 1 Track ID: 12345 Time Tag: 58599.00000000
   Band ID: 1   RAD ID:   29 Scan ID: 0  LRT/HRT: 1  Valid Flag: 0
MSN_JET (0:31) Observation #1 Rx'd at:  (58590.000) Msg. Time:  (58591.000)
  Send to SCS: yes   Rcv Date: 2014030   Synch: ffff   Test Mode: nominal
  State Time:            12:00:00.000   (58591.000)
  State Position:       -1400.0000, -5400.0000, 4400.0000

The first data block is MSN_JET, and contains 4 lines of text.

The 2nd data block is MSN_SENSUM. It contains 3 lines of text, followed by a variable number lines based on the Number of Observations (located in MSN_SENSUM, line 3).

The 2 data blocks (MSN_JET and MSN_SENSUM) are repeated numerous times throughout the text file, and there are times where the number of blocks is not equal.

In the past year, I’ve used the REGEXP function to parse data from text files similar to these. However, I’m not sure if I can take the same approach given the fact that I want to parse entire data blocks.

The goal is to create 2 separate text files for processing. One will contain the MSN_JET data. The other will contain the MSN_SENSUM data.

Any ideas are greatly appreciated. Thanks.

댓글 수: 2
없음 표시없음 숨기기

per isakson 2014년 2월 6일

Does the total file fit in memory?

Brad 2014년 2월 7일

Yes, it does.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

per isakson 2014년 2월 6일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/115155-can-regexp-or-textscan-be-used-to-split-2-distinct-data-sets-from-a-single-text-file#answer_123541

편집: per isakson 2014년 2월 7일

MATLAB Online에서 열기

Try this:

    str = fileread('your_file.txt');
    ca1 = regexp( str, 'MSN_JET.+?(?=(MSN_SENSUM)|($))', 'match' );
    ca2 = regexp( str, 'MSN_SENSUM.+?(?=(MSN_JET)|($))', 'match' );

remains to print the two files. This process does not removed any new-line-characters.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Brad 2014년 2월 7일

Per, thanks for taking the time to look at this.

댓글을 달려면 로그인하십시오.

Answer 2

Kelly Kearney 2014년 2월 6일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/115155-can-regexp-or-textscan-be-used-to-split-2-distinct-data-sets-from-a-single-text-file#answer_123540

MATLAB Online에서 열기

Assuming the answer to per's question is yes, then here's an example:

% Read text
fid = fopen('test.txt');
data = textscan(fid, '%s', 'delimiter', '\n');
fclose(fid);
data = data{1};
% Split based on flags
tmp = zeros(size(data));
flag = {'MSN_JET', 'MSN_SENSUM'};
for ii = 1:length(flag)
    tmp(strncmp(data, flag{ii}, length(flag{ii}))) = ii;
end
idx = tmp > 0; 
tmp1 = tmp(idx); 
tmp = tmp1(cumsum(idx)) % Trick to fill zeros
datasep = cell(size(flag));
for ii = 1:length(flag)
    datasep{ii} = data(tmp == ii);
end

Parsing data values out of that text will likely require some regular expressions.

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Can REGEXP or TEXTSCAN be used to split 2 distinct data sets from a single text file?

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

Can REGEXP or TEXTSCAN be used to split 2 distinct data sets from a single text file?

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기