How do I exclude certain lines from data files?

조회 수: 4 (최근 30일)
Stanley
Stanley 2018년 10월 23일
댓글: Stanley 2019년 1월 17일
I am trying to extract numerical values from hexadecimal values which are generated and stored in the form of .csv data files. However, the format of these .csv data files was recently altered by an update, such that the code no longer works as it relies on the detection of a keyword, in this case 'CUSTOM_MODE_STEP', in order to pick out the relevant lines in the data file. For context, the original data format is thus:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xAF 0x96 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xAF 0x96 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xAF 0x96 0xA2 0x02 0x80 }
However, the format was changed, so that the hexadecimal-containing lines in the data files are now separated by strings of text which obviously cannot be read:
CUSTOM_MODE_STEP_0 = { 0x40 0x43 0x7D 0xA2 0xA2 0xA2 0x00 0x00 }
CUSTOM_MODE_STEP_0_DESCRIPTION = [text]
CUSTOM_MODE_STEP_1 = { 0x00 0xF4 0x7D 0xA2 0xA2 0xA2 0x01 0x00 }
CUSTOM_MODE_STEP_1_DESCRIPTION = [text]
CUSTOM_MODE_STEP_2 = { 0x7C 0x00 0x7D 0xA2 0xA2 0xA2 0x02 0x80 }
CUSTOM_MODE_STEP_2_DESCRIPTION = [text]
I am using a pre-written script, and I am trying to edit it so that it can accommodate this change. The script is below:
for f=fields'
if contains(f,'CUSTOM_MODE_STEP')
ht = DataN.Periph.(char(f));
list = strsplit(ht,{',', '{', '}'});
DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))];
DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
end
end
The variable 'fields' is a 27x1 array, of which the CUSTOM_MODE_STEP variables (both hexadecimal and text values) are present within.
I was thinking of inserting an elseif statement like:
elseif contains(f,'DESCRIPTION')
but I'm unsure as to what command to use exactly to exclude those lines. I've also thought about referencing the correct cells in that array using fields{} but that hasn't worked:
f=fields{17),fields{19},fields{21};
Those numbers being the coordinates for the hexadecimal lines.
Any further information needed please let me know.
  댓글 수: 1
Stanley
Stanley 2019년 1월 17일
in the end I found a very simple solution which was to simply alter the expression here:
for f=fields'
if contains(f,'CUSTOM_MODE_STEP_(\d+)\s+')
ht = DataN.Periph.(char(f));
list = strsplit(ht,{',', '{', '}'});
DataN.ht_1 = [DataN.ht_1; hex2dec(list{1,4}(4:end))*2];
DataN.ht_2 = [DataN.ht_2; hex2dec(list{1,5}(4:end))*2];
DataN.ht_3 = [DataN.ht_3; hex2dec(list{1,6}(4:end))*2];
DataN.ht_4 = [DataN.ht_4; hex2dec(list{1,7}(4:end))*2];
DataN.t_1 = [DataN.t_1; hex2dec(list{1,2}(4:end))];
DataN.t_2 = [DataN.t_2; hex2dec(list{1,3}(4:end))];
end
end
So all I did was append part of the relevant metacharacters from the longer regular expression in Guillaume's answer below and can confirm that it works for multiple files (>100 in number).

댓글을 달려면 로그인하십시오.

답변 (2개)

Guillaume
Guillaume 2018년 10월 23일
It sounds like your original code is very fragile. Looking at the portion you show, it's also not very efficient since there's a lot of array resizing. A single regexp, a call to sscanf and a bit of cell array manipulation is probably all that is needed to get the data you want.
It would be useful to have an example text file to validate against. With the attached file, based on your example data, this is the code I'd use:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
If you then want to convert that into a table with the same variable names as your original structure:
steps = array2table([stepnumber, stepvalues(:, 1:6)], 'VariableNames', {'Step', 't_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'})
  댓글 수: 3
Guillaume
Guillaume 2019년 1월 8일
The only change that needs to be made to my original code, to account for the additional , separating the hex values in your latest example, is to replace the '0x%x ' in the sscanf call by '0x%x, ', so:
filecontent = fileread('test.csv'); %read whole file at once
modesteps = regexp(filecontent, 'CUSTOM_MODE_STEP_(\d+)\s+=\s+\{\s*([^}]+)\}', 'tokens'); %get step and content of '{}'
modesteps = vertcat(modesteps{:});
stepnumber = str2double(modesteps(:, 1));
stepvalues = cellfun(@(hex) sscanf(hex, '0x%x, ', [1 Inf]), modesteps(:, 2), 'UniformOutput', false);
stepvalues = vertcat(stepvalues{:});
"Also, I'm hesitant to implement your code as it is likely to have a knock-on effect on the rest of the (very large) script"
While I can understand the resistance, the way you have it coded at present, the input format, parsing and creating of output data are all deeply interlinked. As you've found out, if the file format change you need to review everything. I would think that changing the design now would result in a lot less pain later. If it were me, I would write a parser that would be even more generic than the above (store the parsed data as key/values pairs) and afterward just look up the required keys.
Anyway, it is trivial to convert the output of the above into your original structure:
fnames = {'t_1', 't_2', 'ht_1', 'ht_2', 'ht_3', 'ht_4'};
namevalues = [fnames; num2cell(stepvalues(:, 1:6), 1)];
dataN = struct(namevalues{:})
Stanley
Stanley 2019년 1월 17일
편집: Stanley 2019년 1월 17일
Appreciate the help Guillaume. It turns out that the solution was very trivial (which I should have figured out much sooner but it's a learning process) - but while trying to adapt your code I did learn about concatenation and regular expressions, so it was worthwhile. I've started another script in any case with your code so I can work on it every now and then.

댓글을 달려면 로그인하십시오.


per isakson
per isakson 2019년 1월 4일
I downloaded example750.csv and tried a different approach of extracting and converting the hex-values
>> cssm( 'example750.csv' )
ans =
64 67 125 162 162 162 0 0
0 244 125 162 162 162 1 0
124 0 125 162 162 162 2 128
where
function out = cssm( ffs )
%
%% Read the file to a cell array of character rows
fid = fopen( ffs, 'r' );
cac = textscan( fid, '%[^\r\n]' );
cac = cac{1};
[~] = fclose( fid );
%% Extract the rows with hex values
pos = regexp( cac, 'CUSTOM_MODE_STEP_\d+\s+=\s+\{' );
cac( cellfun( @isempty, pos ) ) = [];
%% Extract the hex values, which are two characters following "0x"
hex = regexp( cac, '(?<=0x)[A-F\d]{2}', 'match' );
%% Convert to dec values. (hex2dec returns a column, thus reshape.)
dec = cellfun( @(c) reshape(hex2dec(c),1,[]), hex, 'uni',false );
out = cell2mat( dec );
end
  댓글 수: 1
Stanley
Stanley 2019년 1월 17일
Thanks Per, I have yet to test out this code myself, but will definitely try to. I am going to work with Guillaume's code first.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Characters and Strings에 대해 자세히 알아보기

제품

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by