Convert chars into formatted numbers

조회 수: 7 (최근 30일)
Francesco
Francesco 2025년 3월 21일
댓글: Star Strider 2025년 3월 21일
Hello everyone,
I am working on a code which parses a .header file to interpret a big database stored in a .data file (for those familiar, HITRAN).
From the header file I am able to obtain information on where to separate each line of the dataset into a variable and which format this variable is in. I will put below an example of data:
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock.Columns = ' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}.
The question is: assuming that I am able to reorganise Names and Values in the same order of the data file, how can I convert the DataBlocks.Columns chars into numbers following each FormatBlock.Values?
For example:
'molec_id' = ' 1' has formatting '%2d', hence: "molec_id" = 1
'local_lower_quanta' = ' 0 1 0' has formatting '%15s', hence 'local_lower_quanta' = [0 1 0]
'nu' = ' 2800.033883' has formatting '%12.6f', hence 'nu' = 2.800033883e3
etc...
Thank you in advace for your help!

채택된 답변

Star Strider
Star Strider 2025년 3월 21일
I am not certain what result you want.
Try something like this —
% ... Parse the .header file to get variable names (.Names) and their numerical
% formatting (.Values) in C. Some of them are double, some of them are integer numbers
% related to quantum states. Note that Names and Values are not in the same
% order as the columns of the .data file.
FormatBlock.Names = {'a', 'gamma_air', 'gp', 'local_iso_id', 'molec_id', 'sw', 'local_lower_quanta', 'local_upper_quanta', 'gpp', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'iref', 'line_mixing_flag', 'ierr', 'nu', 'gamma_self', 'global_lower_quanta'};
FormatBlock.Values = {'%10.3E', '%5.4f', '%7.1f', '%1d', '%2d', '%10.3E', '%15s', '%15s', '%7.1f', '%10.4f', '%4.2f', '%8.6f', '%15s', '%12s', '%1s', '%6s', '%12.6f', '%5.3f', '%15s'};
% ... Parse the .data file, dividing it into lines and separating values
% into columns keeping them as char. Here an example of one line
DataBlock.Names = {'molec_id', 'local_iso_id', 'nu', 'sw', 'a', 'gamma_air', 'gamma_self', 'elower', 'n_air', 'delta_air', 'global_upper_quanta', 'global_lower_quanta', 'local_upper_quanta', 'local_lower_quanta', 'ierr', 'iref', 'line_mixing_flag', 'gp', 'gpp'}
DataBlock = struct with fields:
Names: {1x19 cell}
DataBlock.Columns = {' 1', '1', ' 2800.033883', ' 1.303E-29', ' 1.003E-04', '.0664', '0.298', ' 2705.1396', '0.65', '0.005780', ' 0 2 0', ' 0 1 0', ' 11 6 5 ', ' 10 1 10 ', '434233', '807294713152', ' ', ' 69.0', ' 63.0'}
DataBlock = struct with fields:
Names: {1x19 cell} Columns: {1x19 cell}
format shortG
DBC = cellfun(@(x)sscanf(x,'%g'),DataBlock.Columns,Unif=0);
disp(DBC)
Columns 1 through 13 {[1]} {[1]} {[2800]} {[1.303e-29]} {[0.0001003]} {[0.0664]} {[0.298]} {[2705.1]} {[0.65]} {[0.00578]} {3x1 double} {3x1 double} {3x1 double} Columns 14 through 19 {3x1 double} {[434233]} {[8.0729e+11]} {0x0 double} {[69]} {[63]}
for k = 1:numel(DBC)
DBC{k}.'
end
ans =
1
ans =
1
ans =
2800
ans =
1.303e-29
ans =
0.0001003
ans =
0.0664
ans =
0.298
ans =
2705.1
ans =
0.65
ans =
0.00578
ans = 1×3
0 2 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
0 1 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
11 6 5
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans = 1×3
10 1 10
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
ans =
434233
ans =
8.0729e+11
ans = []
ans =
69
ans =
63
You can format them at your leisure. Use either sprintf or fprintf depending on what you want to do.
.
  댓글 수: 4
Francesco
Francesco 2025년 3월 21일
Thank you so much!
Star Strider
Star Strider 2025년 3월 21일
As always, my pleasure!

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Type Conversion에 대해 자세히 알아보기

제품


릴리스

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by