필터 지우기
필터 지우기

Replacing missing data in timetable with values from another dataset

조회 수: 14 (최근 30일)
Hi all - I have a reasonably large time-series dataset (57777-by-20) of daily mean temperature for 1980 to present day, recorded from a weather station in Denmark. Within the time-series there are several large gaps in the coverage (> 1-2 months) due to maintenance and servicing of the weather station.
The gaps are too large to consider interpolating between missing data, so I would like to replace the missing values with data recorded from a temporary weather station, that was sited near the main station during periods when it was unable to operate. Having visually and statistically investigate the similarity between the two datasets, this seems a sound approach.
I've imported both datasets from .txt files, converted them to timetables, and located and replaced missing data with nan's.
However, I've been unable to successfully replace the nan's with like-for-like data from the secondary dataset (i.e., replacing with data from the exact same datetime). The datasets cover different temporal ranges, so indexing and replacing cells between the data has not worked (see code below).
I think I'm looking for a similar method but indexing both the rowtimes and the missing data from the temperature variable.
It would be great to hear from anyone with a solution to this problem, or with a similar method for replacing missing data within a time-series with values from a different dataset?
% index to find cells containing no-data
[idx] = find(data1.Temp == 999 | data1.Temp == -999);
% replace no-data vaules with nan
data1.Temp(idx) = nan;
% index to find nan's
idx = isnan(data1.Temp);
% DOES NOT WORK SIMPLY REPLACING NAN'S BY INDEXING THE TWO DATASETS,
% AS THEY COVER DIFFERENT TEMPORAL RNAGES
data1.Temp(idx) = data2.Temp(idx);

채택된 답변

Peter Perkins
Peter Perkins 2021년 11월 28일
IIUC, I think the simplest thing to do would be to remove the rows with missng data, vertically concatenate with the other data, then sort by time:
>> tt1 = timetable(rand(10,1),rand(10,1),'RowTimes',datetime(2021,11,1:10));
tt1.Var1(4:6) = NaN; tt1.Var2(4:6) = NaN
tt1 =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 NaN NaN
05-Nov-2021 NaN NaN
06-Nov-2021 NaN NaN
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
>> tt2 = timetable(rand(3,1),rand(3,1),'RowTimes',datetime(2021,11,4:6))
tt2 =
3×2 timetable
Time Var1 Var2
___________ _______ _______
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
>> tt = sortrows([tt1(~isnan(tt1.Var1),:); tt2])
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
But another way is to assign into one timetable from the other by time:
>> tt = tt1; tt(tt2.Time,:) = tt2
tt =
10×2 timetable
Time Var1 Var2
___________ _______ ________
01-Nov-2021 0.2638 0.51325
02-Nov-2021 0.14554 0.40181
03-Nov-2021 0.13607 0.075967
04-Nov-2021 0.94479 0.33772
05-Nov-2021 0.49086 0.90005
06-Nov-2021 0.48925 0.36925
07-Nov-2021 0.14495 0.23995
08-Nov-2021 0.85303 0.41727
09-Nov-2021 0.62206 0.049654
10-Nov-2021 0.35095 0.90272
  댓글 수: 2
Albert Johan Mamani Larico
Albert Johan Mamani Larico 2023년 8월 25일
This help me a lot!
How to keep values from original timetable(TT1) when I have also values for same datetime in the TT2?
For example, if TT2 has values for nov 7, I would like to keep the original values of TT1 for the same date. What would you recommend?
tt2 = timetable(rand(4,1),rand(4,1),'RowTimes',datetime(2021,11,4:7))
tt2 = 4×2 timetable
Time Var1 Var2 ___________ _______ _______ 04-Nov-2021 0.17953 0.72559 05-Nov-2021 0.96159 0.22381 06-Nov-2021 0.5495 0.59069 07-Nov-2021 0.74658 0.76516
Peter Perkins
Peter Perkins 2023년 10월 20일
I only just now saw this.
Not sure what the question is. The row times for a timetable do not need to be unique, there can be two or more 7-Nov's. You should look at retime if the question involves, "how do I fill in holes"?

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Data Preprocessing에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by