How do I exclude certain lines from data files?

Question

Stanley 2018년 10월 23일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files

댓글: Stanley 2019년 1월 17일

I am trying to extract numerical values from hexadecimal values which are generated and stored in the form of .csv data files. However, the format of these .csv data files was recently altered by an update, such that the code no longer works as it relies on the detection of a keyword, in this case 'CUSTOM_MODE_STEP', in order to pick out the relevant lines in the data file. For context, the original data format is thus:

CUSTOM_MODE_STEP_0 = { 0x40   0x43   0x7D   0xAF   0x96   0xA2   0x00   0x00 }
CUSTOM_MODE_STEP_1 = { 0x00   0xF4   0x7D   0xAF   0x96   0xA2   0x01   0x00 }
CUSTOM_MODE_STEP_2 = { 0x7C   0x00   0x7D   0xAF   0x96   0xA2   0x02   0x80 }

However, the format was changed, so that the hexadecimal-containing lines in the data files are now separated by strings of text which obviously cannot be read:

CUSTOM_MODE_STEP_0 = { 0x40   0x43   0x7D   0xA2   0xA2   0xA2   0x00   0x00 }        
CUSTOM_MODE_STEP_0_DESCRIPTION = [text]
CUSTOM_MODE_STEP_1 = { 0x00   0xF4   0x7D   0xA2   0xA2   0xA2   0x01   0x00 }        
CUSTOM_MODE_STEP_1_DESCRIPTION = [text]
CUSTOM_MODE_STEP_2 = { 0x7C   0x00   0x7D   0xA2   0xA2   0xA2   0x02   0x80 }  
      
CUSTOM_MODE_STEP_2_DESCRIPTION = [text]

I am using a pre-written script, and I am trying to edit it so that it can accommodate this change. The script is below:

                    for f=fields'
                        if contains(f,'CUSTOM_MODE_STEP')
                            ht =  DataN.Periph.(char(f));
                            list = strsplit(ht,{',', '{', '}'});
                            DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                            DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                            DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                            DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                            DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                            DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];  
                        end
                    end

The variable 'fields' is a 27x1 array, of which the CUSTOM_MODE_STEP variables (both hexadecimal and text values) are present within.

I was thinking of inserting an elseif statement like:

elseif contains(f,'DESCRIPTION')

but I'm unsure as to what command to use exactly to exclude those lines. I've also thought about referencing the correct cells in that array using fields{} but that hasn't worked:

f=fields{17),fields{19},fields{21};

Those numbers being the coordinates for the hexadecimal lines.

Any further information needed please let me know.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Stanley 2019년 1월 17일

in the end I found a very simple solution which was to simply alter the expression here:

for f=fields'
                    if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+')
                        ht =  DataN.Periph.(char(f));
                        list = strsplit(ht,{',', '{', '}'});
                        DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                        DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                        DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                        DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                        DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                        DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
                    end
end

So all I did was append part of the relevant metacharacters from the longer regular expression in Guillaume's answer below and can confirm that it works for multiple files (>100 in number).

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Guillaume 2018년 10월 23일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files#answer_342954

MATLAB Online에서 열기

It sounds like your original code is very fragile. Looking at the portion you show, it's also not very efficient since there's a lot of array resizing. A single regexp, a call to sscanf and a bit of cell array manipulation is probably all that is needed to get the data you want.

It would be useful to have an example text file to validate against. With the attached file, based on your example data, this is the code I'd use:

filecontent = fileread('test.csv');  %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});

If you then want to convert that into a table with the same variable names as your original structure:

steps = array2table([stepnumber, stepvalues(:, 1:6)], 'VariableNames', {'Step', 't_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'})

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Stanley 2018년 10월 23일

MATLAB Online에서 열기

example750.csv

Hello Guillaume,

Thanks for responding.

Attached is an example.csv file which I've truncated.

Below is a more complete section of the code.

elseif contains(sample,'750')
                    fields = fieldnames(DataN.Periph);
                    steps = contains(fields,'CUSTOM_MODE_STEP');
                    DataN.UID = DataN.Periph.UID(1:end-2);
                    DataN.UID_880 = DataN.Periph.UID_DGPxxx;
                    DataN.position = DataN.Periph.UID(end-1:end); 
                    DataN.index = count;
                      DataN.ht_1 =[]; DataN.ht_2 =[]; DataN.ht_3 =[]; DataN.ht_4 =[];
                      DataN.t_1 =[]; DataN.t_2 =[];
                      for f=fields'
                          if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}')
                              ht =  DataN.Periph.(char(f));
                              list = strsplit(ht,{',', '{', '}'});
                              DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
                              DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
                              DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
                              DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
                              DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))]; 
                              DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];  
                          end
                      end
                      DataN.R0=DataN.R; 
                      DataS(count)=DataN;
                      count = count+1;
                  end

As you can see I had a stab at replacing CUSTOM_MODE_STEP with the regular expression, which I guess was what I was really after. I assumed those operators will skip the 'DESCRIPTION' variables, but it seems as though using that as the input will skip all of the interim hex2dec code and cut straight to end.

Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script.

Guillaume 2019년 1월 8일

MATLAB Online에서 열기

The only change that needs to be made to my original code, to account for the additional , separating the hex values in your latest example, is to replace the '0x%x ' in the sscanf call by '0x%x, ', so:

filecontent = fileread('test.csv');  %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x, ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});

"Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script"

While I can understand the resistance, the way you have it coded at present, the input format, parsing and creating of output data are all deeply interlinked. As you've found out, if the file format change you need to review everything. I would think that changing the design now would result in a lot less pain later. If it were me, I would write a parser that would be even more generic than the above (store the parsed data as key/values pairs) and afterward just look up the required keys.

Anyway, it is trivial to convert the output of the above into your original structure:

fnames = {'t_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'};
namevalues = [fnames; num2cell(stepvalues(:, 1:6), 1)];
dataN = struct(namevalues{:})

Stanley 2019년 1월 17일

편집: Stanley 2019년 1월 17일

Appreciate the help Guillaume. It turns out that the solution was very trivial (which I should have figured out much sooner but it's a learning process) - but while trying to adapt your code I did learn about concatenation and regular expressions, so it was worthwhile. I've started another script in any case with your code so I can work on it every now and then.

댓글을 달려면 로그인하십시오.

Answer 2

per isakson 2019년 1월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/425642-how-do-i-exclude-certain-lines-from-data-files#answer_354902

MATLAB Online에서 열기

I downloaded example750.csv and tried a different approach of extracting and converting the hex-values

>> cssm( 'example750.csv' )
ans =
    64    67   125   162   162   162     0     0
     0   244   125   162   162   162     1     0
   124     0   125   162   162   162     2   128
   

where

function    out = cssm( ffs )
    %
    %%  Read the file to a cell array of character rows
    fid = fopen( ffs, 'r' );
    cac = textscan( fid, '%[^\r\n]' );
    cac = cac{1};
    [~] = fclose( fid );
    %%  Extract the rows with hex values
    pos = regexp( cac, 'CUSTOM_MODE_STEP_\d+\s+=\s+\{' );
    cac( cellfun( @isempty, pos ) ) = [];
    %%  Extract the hex values, which are two characters following "0x"
    hex = regexp( cac, '(?<=0x)[A-F\d]{2}', 'match' );
    %%  Convert to dec values. (hex2dec returns a column, thus reshape.)
    dec = cellfun( @(c) reshape(hex2dec(c),1,[]), hex, 'uni',false );
    out = cell2mat( dec );
end

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Stanley 2019년 1월 17일

Thanks Per, I have yet to test out this code myself, but will definitely try to. I am going to work with Guillaume's code first.

댓글을 달려면 로그인하십시오.

How do I exclude certain lines from data files?

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (2개)

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

How do I exclude certain lines from data files?

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

답변 (2개)

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기