Treat and handle missing hourly data (with daily profile), that might have large gaps

Question

Anwaar Alghamdi 2022년 11월 24일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps

편집: Anwaar Alghamdi 2022년 11월 24일

I want to treat huge missy temperature data with many missing values (presented as 999.9).

If there is few missing data within the day, I would take average from data before and after. But if I have large missing clusters (almost full-day missing, or up to 100 values in a row), I would take average of 1PM temperature from yesterday and 1PM temperature from tomorrow to get 1PM value for today, and same goes for all hours.

Note: I don't wish to change valid assigned tempratures linked to hours (like what interp1 would do with values order).

What can I use to handle these data?

08/09/2016 	4:00:00	 26
08/09/2016 	5:00:00	 26
08/09/2016 	6:00:00	 25
08/09/2016 	6:00:00	 999.9
08/09/2016 	7:00:00	 24
08/09/2016 	8:00:00	 25
08/09/2016 	9:00:00	 24
08/09/2016 	9:00:00	 999.9
08/09/2016 	10:00:00 23

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

Anwaar Alghamdi 2022년 11월 24일

@Jiri Hajek

Also, if I do linear interpolation, the non-999 values will be missed up (at least their order). I don't want to touch the temperatures assigned for each hour. Only estimate the 999 values.

Jiri Hajek 2022년 11월 24일

As for the cluster identification, I can give you some hints - will put them below into an answer. As for the handling of large missing clusters, I would leave themo out, i.e. constrain the scope.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Jiri Hajek 2022년 11월 24일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/1861273-treat-and-handle-missing-hourly-data-with-daily-profile-that-might-have-large-gaps#answer_1110293

MATLAB Online에서 열기

To identify the clusters of outliers, one may use logical indexing and the time vector. This is just a skeletal draft of the algorithm, but you can get the idea.

timeColumn  % your datatime values
temperatureColumnRaw % your original temperatures
outlierPoints = temperatureColumnRaw > 900;
outlierTimes = timeColumn(outlierPoints);
timeDifsOfOutliers = diff(outlierTimes);
clusterStartsLogical = [1; timeDifsOfOutliers > mode(diff(timeColumn))];
clusterStartTimes = outlierTimes(clusterStartsLogical);
nClusters = length(clusterStart); 
if nClusters > 1
    clusterStartIndices = find(clusterStartsLogical);
    clusterEndPoints = [clusterStartIndices(2:end)-1;length(outlierTimes)];
    clusterEndTimes = outlierTimes(clusterEndPoints);
end
clusterDurations = clusterEndTimes-clusterStartTimes;
shortClusterIndices = clusterDurations > hours(3);   % you define, what is a short cluster

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Treat and handle missing hourly data (with daily profile), that might have large gaps

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Treat and handle missing hourly data (with daily profile), that might have large gaps

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시이전 댓글 -2개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 0
이전 댓글 -2개 표시이전 댓글 -2개 숨기기