Read large dat file and get the necessary data

I have a rather large dat file (~1.5 Gb) which I import into Matlab. It contains a text and value as below.
{
"os": [
{
"utc": "2021-09-14 02:54:56",
"lat": 35.59538,
"lon": 129.574246,
"hdt": 295.9,
"rot": -2.1,
"sog": 1.0,
"cog": 335.5,
"rudder_order_stbd": null,
"rudder_order_port": null,
"rudder_stbd": 0.0,
"rudder_port": 0.0,
"rpm_stbd": 0.0,
"rpm_port": 0.0,
"stw_long": 0.87,
"stw_trans": "NaN",
"stw_long_stern": "NaN",
"stw_trans_stern": "NaN",
"stw_speed": null,
"wind_dir": 134.0,
"wind_speed": 5.5,
"current_dir": null,
"current_speed": null
},
{
"utc": "2021-09-14 02:54:58",
"lat": 35.595385,
"lon": 129.574233,
"hdt": 295.9,
"rot": -1.3,
"sog": 0.9,
"cog": 331.1,
"rudder_order_stbd": null,
"rudder_order_port": null,
"rudder_stbd": 0.0,
"rudder_port": 0.0,
"rpm_stbd": 0.0,
"rpm_port": 0.0,
"stw_long": 0.87,
"stw_trans": "NaN",
"stw_long_stern": "NaN",
"stw_trans_stern": "NaN",
"stw_speed": null,
"wind_dir": 141.0,
"wind_speed": 5.3,
and
"ts": [
[
{
"header": "VDM",
"msg_type": 1,
"mmsi": 440196110,
"navi_status": 0,
"time_stamp": 54,
"lat": 35.515383,
"lon": 129.386093,
"hdt": 2,
"rot_raw": 0,
"rot": "0",
"cog": 327.6,
"sog": 0.0
},
{
"header": "VDM",
"msg_type": 1,
"mmsi": 355924000,
"navi_status": 0,
"time_stamp": 56,
"lat": 35.345183,
"lon": 129.467416,
"hdt": 221,
"rot_raw": -127,
"rot": "-708",
"cog": 225.0,
"sog": 2.6
}
I want to export the value for each parameter as the matrix in .DAT file. But as you can guess, for a file this size it takes forever to run through. Is there a better way of accomplishing this and export the data?
Many thanks!

댓글 수: 2

@Diep Nguyen: please upload a representative data file by clicking the paperclip button.
A representative data file can be shortened, but must include sufficient data so that we can understand the file format.
Jan
Jan 2022년 2월 16일
Yes, an 1.5 GB JSON file in text mode will take some time for reading.
It is not clear, what "export each matrix in .DAT" file means. Which matrices do you mean?

댓글을 달려면 로그인하십시오.

 채택된 답변

Mathieu NOE
Mathieu NOE 2022년 2월 16일

0 개 추천

hello
I made this little wrapper for you
the amount of parameters you can export is up to you ()
here the full monty with all 22 data saved to excel (time axis is "row")
code :
clc
clearvars
filename = 'Data_22.txt';
parameters_length = 22; % do not exceed the max = 22
% data to retrieve (for info)
% "utc": "2021-09-14 02:56:02",
% "lat": 35.595596,
% "lon": 129.574031,
% "hdt": 294.6,
% "rot": -1.5,
% "sog": 0.9,
% "cog": 326.8,
% "rudder_order_stbd": null,
% "rudder_order_port": null,
% "rudder_stbd": 0.0,
% "rudder_port": 0.0,
% "rpm_stbd": 0.0,
% "rpm_port": 0.0,
% "stw_long": 0.79,
% "stw_trans": "NaN",
% "stw_long_stern": "NaN",
% "stw_trans_stern": "NaN",
% "stw_speed": null,
% "wind_dir": 90.0,
% "wind_speed": 3.1,
% "current_dir": null,
% "current_speed": null
% a = readlines(filename); % if you have readlines
a = my_readlines(filename); % work around for earlier matlab releases (not having readlines)
% first data (utc) indexes (as reference for further processing)
indexes = find(contains(a,'utc'));
out = [];
for ci = 1:length(indexes)
current_index = indexes(ci);
STR = split(a(current_index +(0:parameters_length-1))','":');
tmp = strrep(STR(:,2),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
out = [out tmp];%concatenation of all data
end
% concatenate labels (first column of STR) with all data (out)
tmp = strrep(STR(:,1),'"',''); %get rid of double quotes
out = [tmp out];
% now save to excel
writecell(out,'test.xlsx');
%%%%%%%%%%%%%%%%
function LINES = my_readlines(FILENAME)
% work around for earlier matlab releases (not having readlines)
LINES = regexp(fileread(FILENAME), '\r?\n', 'split');
if isempty(LINES{end}); LINES(end) = []; end %end of file correction
end

댓글 수: 5

hello
this is an improved code ; it will export the "os" section data in the first sheet of the excel output file, and the "ts" section data in the second sheet of the excel file
hope it helps
clc
clearvars
%% load file
filename = 'Data_22.txt';
% a = readlines(filename); % if you have readlines
a = my_readlines(filename); % work around for earlier matlab releases (not having readlines)
%% "os" data section
os_parameters_length = 22; % do not exceed the max = 22
% "utc": "2021-09-14 02:56:02",
% "lat": 35.595596,
% "lon": 129.574031,
% "hdt": 294.6,
% "rot": -1.5,
% "sog": 0.9,
% "cog": 326.8,
% "rudder_order_stbd": null,
% "rudder_order_port": null,
% "rudder_stbd": 0.0,
% "rudder_port": 0.0,
% "rpm_stbd": 0.0,
% "rpm_port": 0.0,
% "stw_long": 0.79,
% "stw_trans": "NaN",
% "stw_long_stern": "NaN",
% "stw_trans_stern": "NaN",
% "stw_speed": null,
% "wind_dir": 90.0,
% "wind_speed": 3.1,
% "current_dir": null,
% "current_speed": null
%% "ts" data section
ts_parameters_length = 12; % do not exceed the max = 12
% "header": "VDM",
% "msg_type": 1,
% "mmsi": 440196110,
% "navi_status": 0,
% "time_stamp": 54,
% "lat": 35.515383,
% "lon": 129.386093,
% "hdt": 2,
% "rot_raw": 0,
% "rot": "0",
% "cog": 327.6,
% "sog": 0.0
%% main loop
out_os = do_job(a,'utc',os_parameters_length);
out_ts = do_job(a,'header',ts_parameters_length);
% now save to excel
out_file = 'test.xlsx';
writecell(out_os,out_file,"Sheet",1);
writecell(out_ts,out_file,"Sheet",2);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%% sub functions section %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function out = do_job(a,ref,parameters_length)
% example : first data (utc) indexes (as reference for further processing)
indexes = find(contains(a,ref));
out = [];
for ci = 1:length(indexes)
current_index = indexes(ci);
STR = split(a(current_index +(0:parameters_length-1))','":');
tmp = strrep(STR(:,2),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
out = [out tmp];%concatenation of all data
end
% concatenate labels (first column of STR) with all data (out)
tmp = strrep(STR(:,1),',','');%get rid of commas
tmp = strrep(tmp,'"',''); %get rid of double quotes
tmp = strtrim(tmp); % Remove leading and trailing whitespace
out = [tmp out];
end
%%%%%%%%%%%%%%%%
function LINES = my_readlines(FILENAME)
% work around for earlier matlab releases (not having readlines)
LINES = regexp(fileread(FILENAME), '\r?\n', 'split');
if isempty(LINES{end}); LINES(end) = []; end %end of file correction
end
Hello!!
Thank you so much for your help.
It helps me so much.
As always, my pleasure !
would you mind accepting my answer ? tx
hello
good news
when I say "accept" my anwer it means that on your side you "click" the "accept" button that should appear next to my answer
No problem :)

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Mathematics에 대해 자세히 알아보기

태그

질문:

2022년 2월 16일

댓글:

2022년 3월 1일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by