Combining text files with dates and missing data

I have numerous files in this format
YEAR MONTH DAY HOUR MIN VAL
1999 1 1 0 0 123
where the file contains data for a single year, month runs 1-12, day from 1-30 (or 28,29, or 31), hour goes from 0-23 and min almost always 0 and val represents a different type of value depending on the file (i.e. one file will have speed, another the mass, another the temperature of the object, etc.).
I want to combine the text files so that they end up like this
YEAR MONTH DAY HOUR MIN VAL1 VAL2 VAL3
1999 1 1 0 0 123 321 123
up to 7 values.
My problem is when data is missing/not recorded in one of the text files (e.g. Jan 1st 1999 Hours 0-23 has data but the next row is Jan 3rd 1999 Hour 0). When the data is missing, I'm not sure what to do as the arrays become uneven and I can't concatenate them unless I somehow fill in the gaps.
Is there a way to fill in any gaps with a blank or some non-data value like 99999 so that I can combine them? Or is there another way to combine the files? The latter is my goal.

댓글 수: 6

Would it be acceptable to add a delimiter between the fields such as comma or tab?
When the minute is not 0 then should that be treated differently from an entry that is 0? For example if you had 5 values in a row with the same information and minute 0 and then the next one had the same except minute 1, should that start a new line or should it be included in the previous?
I just checked, minute is always 0. It's hourly data. I think a delimiter should not be a problem as the data will eventually go to a program that is capable of importing Excel files, so I should think a tab or comma delimiter is all right.
Oh wait, those values are to be accumulated across files ? Are there exactly 7 files?
Would you happen to be using OS-X or Linux? If you are, then I suggest you work outside of MATLAB, such as by using fsdb
I have a few hundred files (various years) and seven types of values, and opening the data in MATLAB and manipulating the arrays is otherwise no problem, but when some data files are shorter than others, then the column vectors are shorter than others (i.e. because VAL1 in 1934 is missing some days or months of data, it is not the same length as VAL2 which has data for every hour of the year and and is complete, so I can't simply just use concatenation) I'm not sure how to make the dates and values align and fill in blanks with something like 9999.
I'm on Windows.
To check: for each year there are 7 files? Or is there only 1 file that has 7 types of values in it? If it is 1 file that has 7 types of values in it, how do you tell which value-type a particular line is?
Each year has a bunch of files, but 7 values in particular I'm interested in, so for the purpose here, basically for each year there are 7 files, each with a date and time and value. I'm trying to put them into one file.

댓글을 달려면 로그인하십시오.

 채택된 답변

Walter Roberson
Walter Roberson 2015년 8월 12일
Assuming the data is in DataVal_1, DataVal_2 through DataVal_7, and assuming that either only a single year is referenced or else that the first entry of DataVal_1 refers to the lowest numbered year to be considered:
filler_value = 9999;
first_hour = datenum(DataVal_1(:,1),1,1);
timekeyvec = DataVal_1(:,1:4); %Y,M,D,H
timekeyvec(end,6) = 0; %extend by 0 for min and sec
timekeynum = datenum(timekeyvec); %serial date numbers
houridx_1 = round((timekeynum - first_hour) * 24) + 1;
Do the same thing for the other DataVal_* arrays producing houridx_1 through houridx_7 . Then
lasthouridx = max( [houridx_1; houridx_2; houridx_3; houridx_4; houridx_5; houridx_6; houridx_7] );
mergedvals = filler_value * ones(lasthouridx, 7);
mergedvals(houridx_1,1) = DataVal_1(:,6);
mergedvals(houridx_2,2) = DataVal_2(:,6);
mergedvals(houridx_3,3) = DataVal_3(:,6);
etc.
date_of_entries = datevec(((1:lasthouridx) - 1) / 24 + first_hour);
merge_table = [date_of_entries(:,1:4), zeros(lasthouridx,1), mergedvals];
Now write out merge_table
(Testing further, I find it would be possible to build up the mergedvals a file at a time instead of having to read all 7 files first)

댓글 수: 3

Hi, I think this would work well and so far I tested it with a sample and it writes well up to 'mergedvals'. However, I can't proceed with the last two lines as I get mismatching dimensions.
date_of_entries = 4510080 x 6 (looks like it's just the same year month and day repeating each row (and the last two columns zeros))
mergedvals = 2160 x 7
Also, most of my hourids are 2088x1 (the same length as the corresponding text file, so this is so far so good). The text file runs from January 1st 1965 to March 31st 2300 1965 (just testing a sample), and there are three days missing with some of the data (Jan 2nd-4th, resumes on the 5th), so my mergedvals dimensions are correct (2088 + (24hours*3days missing)) so everything is working here and mergedvals seems to have been filled in correctly. Not sure why the dimensions of date_of_entries is like that
(((1:lasthouridx) - 1) / 24 + first_hour) has dimensions 2088 x 2160. Running that through datevec puts them in a column vector of (2088*2160) x 6.
Also, it appears that date_of_entries isn't repeating set of values. I checked and each hour entry repeats for a very long time (e.g. it repeats [1965 1 1 0 0 0] for a very long time before changing to [1965 1 2 0 0 0]). It should repeat in a (0:23)' pattern.
first_hour = datenum(DataVal_1(1,1),1,1);

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

도움말 센터File Exchange에서 Data Type Identification에 대해 자세히 알아보기

질문:

2015년 8월 12일

댓글:

2015년 8월 12일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by