How to read in a non-standard formated CSV file containing times and measurments and change the output format?

Question

Mark Jackson 2019년 8월 5일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/474928-how-to-read-in-a-non-standard-formated-csv-file-containing-times-and-measurments-and-change-the-outp

댓글: Mark Jackson 2019년 8월 18일

Hi

I have a large CSV data file which I want to read in to matlab and correlate data times (shown as TIM in the example below) with various data measurements (ABC, DEF, GHI). The structure of the data file (which I cannot change) is such that the date / time is stated on a single line and the following lines (until the next TIM line) contain measurements taken at the previous time step. Broadly speaking the data format is as follows:

TIM, 20190805, 110000
ABC, 123
DEF, 456
GHI, 789
TIM, 20190805, 110001
ABC, 123
DEF, 456
GHI, 789
TIM, 20190805, 110002
ABC, 123
DEF, 456
GHI, 789

The output I want is

DATE    , TIME  , ABC, DEF, GHI
20190805, 110000, 123, 456, 789
20190805, 110001, 123, 456, 789
20190805, 110002, 123, 456, 789

so i can easily correlate times with measurements. The brute force approach would be to loop over every line, save the date / time when it appears and keep reading over the measurements until you reach then next time stamp. Once you hit this, save everything into an array of some flavour (cell etc) with the time step. However, my CSV file is very large so reading through and parsing the whole file line-by-line is VERY slow. Hence I am looking for a quicker way to read in the data and get to the output I want.

I first approached this by using the readtable command in MATLAB to get a MATLAB table, then find all the time/date (TIM) rows and use row offsets e.g. ABC = 1, DEF = 2, GHI = 3 etc to filter out the measurement rows. I then end up with four arrays TIM, ABC, DEF, GHI. However, as always, this is proving to be a nightmare due to inconsistencies in the data. Examples of this are

TIM, 20190805, 110000
ABC, 123
GHI, 789
DEF, 456
TIM, 20190805, 110001
ABC, 123
GHI, 789
DEF, 456
TIM, 20190805, 110002
ABC, 123
DEF, 456
GHI, 789

(Note that the order of DEF and GHI have been switched in the first two blocks)

TIM, 20190805, 110000
ABC, 123
TIM, 20190805, 110000
DEF, 456
GHI, 789
TIM, 20190805, 110001
ABC, 123
DEF, 456
GHI, 789
TIM, 20190805, 110002
ABC, 123
DEF, 456
TIM, 20190805, 110002
GHI, 789

(Note the repeated date / time (TIM) lines with the same times)

Any thoughts on a robust reader to transform my data?

Thanks in advance

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

meghannmarie 2019년 8월 6일

Do you want to round to nearest second or drop the fraction? In your example I was using, there would be 2 values for GHI in 110001 second value if you round.

Mark Jackson 2019년 8월 6일

I'm quite happy to drop the fraction if that makes things easier. I don't need the extra precision in the times

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

meghannmarie 2019년 8월 6일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/474928-how-to-read-in-a-non-standard-formated-csv-file-containing-times-and-measurments-and-change-the-outp#answer_386471

MATLAB Online에서 열기

This gets the anwer if you drop the fraction and the time format is HHmmss, if you have a fraction you can edit the line with time format.

[~,~,raw] = xlsread('data.csv');
num_rows = size(raw,1);
time_idx = contains(raw(:,1),'TIM');
abc_idx = contains(raw(:,1),'ABC');
def_idx = contains(raw(:,1),'DEF');
ghi_idx = contains(raw(:,1),'GHI');
val_times = NaT(num_rows,1);
date = datetime([raw{time_idx,2}]','ConvertFrom','yyyymmdd');
time = datetime(cellstr(num2str([raw{time_idx,3}]','%06d')),'InputFormat','HHmmss');% edit '%06d' and 'HHmmss' if you have decimal seconds
val_times(time_idx) = date + timeofday(time);
val_times = fillmissing(val_times,'previous');
times = unique(val_times);
vals = raw(:,2);
output_data = cell(numel(times),4);
output_data(:) = {NaN};
output_data(:,1) = cellstr(datestr(times));
abc = vals(abc_idx);
abc_times = val_times(abc_idx);
[~,I] = intersect(times,abc_times,'stable');
output_data(I,2) = abc;
def = vals(def_idx);
def_times = val_times(def_idx);
[~,I]=intersect(times,def_times,'stable');
output_data(I,3) = def;
ghi = vals(ghi_idx);
ghi_times = val_times(ghi_idx);
[~,I]=intersect(times,ghi_times,'stable');
output_data(I,4) = ghi;
T = cell2table(output_data,'VariableNames',{'DATE','ABC','DEF','GHI'});
writetable(T,'data_reformat.csv');

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Mark Jackson 2019년 8월 18일

Worked a treat!! Thanks for your help

댓글을 달려면 로그인하십시오.

Answer 2

Jeremy Hughes 2019년 8월 5일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/474928-how-to-read-in-a-non-standard-formated-csv-file-containing-times-and-measurments-and-change-the-outp#answer_386288

You could do this with textscan

fid = fopen(file);

vars = textscan(fid,'TIM%f%fABC%fDEF%fGHI%f','Delimiter',{','},'Whitespace',' \n')

T = table(vars{:},'VariableNames',["DATE","TIME","ABC","DEF","GHI"])

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

How to read in a non-standard formated CSV file containing times and measurments and change the output format?

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

채택된 답변

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

How to read in a non-standard formated CSV file containing times and measurments and change the output format?

댓글 수: 6 이전 댓글 4개 표시이전 댓글 4개 숨기기

채택된 답변

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

추가 답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 6
이전 댓글 4개 표시이전 댓글 4개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기