Help correcting a messy time series data

조회 수: 2 (최근 30일)
lightworks
lightworks 2015년 7월 1일
편집: Andrei Bobrov 2015년 7월 3일
Hi,
I have multiple files of "daily" minimum temperature. The time series is non-continuos. Files start and end in different dates, some days in the middle are missing (the rows are missing), and some days have more than one measurement.
This is an example of what I have:
1999 01 01 5.2
1999 01 02 4.3
1999 01 02 5.0
1999 01 02 4.1
1999 01 03 3.8
1999 01 05 3.2
...
So day 02-jan has 3 different meassurements and day 04-jan is missing. Say that I need all the files to begin at 31-dec-1998 and end at 07-jan-1999, my files should end up looking like this:
1999 12 31 6.6
1999 01 01 5.2
1999 01 02 4.1
1999 01 03 3.8
1999 01 04 NaN
1999 01 05 3.2
1999 01 06 NaN
1999 01 07 NaN
So far I managed, using this thread , to fill the missing dates with NaNs.
I still need to:
1) take the minimum value of the days with more than one measurement. I have absolutely no idea how to do this...
2) complete the period with NaN or crop the data between the starting and the ending date that I need. So far I managed with a long and messy script to crop and/or add the missing data at the end. But I couldn't make that work for the beggining, and I'm sure there must be an easier, more effective way of doing so.
I would appreciate any help you can give me!

채택된 답변

Walter Roberson
Walter Roberson 2015년 7월 2일
YMD_data = YourData(:,1:3);
temperature_data = YourData(:,4); %must be column vector
first_wanted = datenum('31-dec-1998');
last_wanted = datenum('07-jan-1999');
dayspan = last_wanted - first_wanted + 1;
dnum = datenum(YMD_data);
in_range = first_wanted <= dnum & dnum <= last_wanted;
useful_dnum = dnum(in_range);
useful_temperature = temperature_data(in_range);
relday = useful_dnum - min(first_wanted) + 1; %so first day is 1, next is 2, etc
mintemp = accumarray(relday, useful_temperature, [dayspan, 1], @min, NaN);
wanted_dates_vec = datevec(first_wanted : last_wanted);
results = wanted_dates_vec(:,1:3, mintemp);
That's it. All of the real work is being done by the accumarray() call, which is going to construct an entry for each consecutive day, and the entry is going to be the min() of all of the data entries with the same date relative date number. The entries for which there is no information will be filled with NaN.
The bit after that constructs the output table. datevec() produces an N x 6 array in which the first three entries are Y M D (then H Min S). Extract those three, paste on the column output from accumarray and the task is done. The entries will appear in sequence, the minimums taken, the missing data NaN filled. No subselection of the output is needed because we selected what we wanted before passing it into accumarray.
  댓글 수: 3
Thorsten
Thorsten 2015년 7월 2일
편집: Thorsten 2015년 7월 2일
Try
whos in_range
Should be of class logical; It has zeros at all positions where dnum is not within the specified range.
And in the next line
useful_dnum = dnum(in_range);
logical indexing is used, i.e., all values from dnum are picked for which in_range is 1.
lightworks
lightworks 2015년 7월 2일
편집: lightworks 2015년 7월 2일
It works perfectly!!
:D
Thank you so much!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Andrei Bobrov
Andrei Bobrov 2015년 7월 2일
편집: Andrei Bobrov 2015년 7월 3일
d = [
1998 12 14 6
1999 01 01 5.2
1999 01 02 4.3
1999 01 02 5.0
1999 01 02 4.1
1999 01 03 3.8
1999 01 05 3.2
1999 02 12 8]
[y,m,dy] = datevec((datenum([1998 12 31]):datenum([1999 1 7]))');
daout = [y,m,dy];
[a,~,c] = unique(d(:,1:3),'rows');
d2 = accumarray(c,d(:,end),[],@min);
[lo,ii] = ismember(a,daout,'rows')
daout(:,end+1) = nan;
daout(ii(lo),end) = d2(lo);
  댓글 수: 2
lightworks
lightworks 2015년 7월 2일
편집: lightworks 2015년 7월 2일
Thanks a lot! Although the example is working just fine, when I apply it to my data I get a Subscripted assignment dimension mismatch.
It comes from the last line. The problem is daout has as many rows as needed and d2 has not the same amout of rows. When data begin before the first day asked (31-dec-1998) or ends after, d2 ends up having more rows than daout.
I've been trying to make it work but so far I couldn't. I'll keep trying though.
Andrei Bobrov
Andrei Bobrov 2015년 7월 3일
corrected

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Time Series Objects에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by