Treat and handle missing hourly data (with daily profile), that might have large gaps

조회 수: 1 (최근 30일)
I want to treat huge missy temperature data with many missing values (presented as 999.9).
If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.
Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).
What can I use to handle these data?
08/09/2016 4:00:00 26
08/09/2016 5:00:00 26
08/09/2016 6:00:00 25
08/09/2016 6:00:00 999.9
08/09/2016 7:00:00 24
08/09/2016 8:00:00 25
08/09/2016 9:00:00 24
08/09/2016 9:00:00 999.9
08/09/2016 10:00:00 23
  댓글 수: 5
Anwaar Alghamdi
Anwaar Alghamdi 2022년 11월 24일
Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.
Jiri Hajek
Jiri Hajek 2022년 11월 24일
As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

댓글을 달려면 로그인하십시오.

답변 (1개)

Jiri Hajek
Jiri Hajek 2022년 11월 24일
To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.
timeColumn % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart);
if nClusters > 1
clusterStartIndices = find(clusterStartsLogical);
clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3); % you define, what is a short cluster

카테고리

Help CenterFile Exchange에서 Dates and Time에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by