filling missed timestamps using prediction
조회 수: 1 (최근 30일)
이전 댓글 표시
Good day.
I have a file with timestamp records where timestamps were recorded every time my sensor detects a barrier and assigns to timestamp the moment the barrier passes through the detector and the moment it leaves, and writes both times (entering and leaving) to the logging file. However, i faced a problem with my computer shutting off randomly while my experiment was running while the barrier kept passing all the time through the detector and the computer was off. Therefore, I have many gaps in my data because every day I switch the computer on and continue logging the data.
Is there a way I can fill the gaps with a logic prediction using the data I have collected? I have attached the data file I have, I appreciate all the help in advance.
댓글 수: 3
dpb
2021년 7월 5일
Gotcha. My eyes glazed over and failed to note the two times after the date stamp...
채택된 답변
dpb
2021년 7월 5일
편집: dpb
2021년 7월 5일
tData=readtable('data.csv');
tData.Var1=datetime(tData.Var1,'InputFormat','eee MMM dd yyyy');
tData.Properties.VariableNames={'Date','Open','Close'};
tData.Span=seconds(tData.Close-tData.Open);
gives the preview of the data as
>> head(tData)
ans =
8×4 table
Date Open Close Span
___________ ________ ________ ____
22-Jun-2021 16:07:49 16:07:50 1
22-Jun-2021 16:07:54 16:07:57 3
22-Jun-2021 16:08:01 16:08:03 2
22-Jun-2021 16:08:06 16:08:09 3
22-Jun-2021 16:08:13 16:08:16 3
22-Jun-2021 16:09:01 16:09:02 1
22-Jun-2021 16:09:02 16:09:03 1
22-Jun-2021 16:09:04 16:09:04 0
>>
So, then let's look at the distribution of Span --
>> u=[unique(tData.Span) histc(tData.Span,unique(tData.Span))]
u =
-86399 5
-39549 1
-35965 1
-32346 1
-32339 1
0 40665
1 291387
2 353
3 3
14 1
68479 1
>>
From this my first conclusion is you need to be recording at least one more digit of resolution in the time stamps as you have quite a number of times that are recorded as identically zero with only one-second resolution.
Let's see what the oddball time frames look like:
>> tData(find(tData.Span>20)+[-2:2],:)
ans =
5×4 table
Date Open Close Span
___________ ________ ________ _____
27-Jun-2021 00:35:16 00:35:16 0
27-Jun-2021 00:35:17 00:35:18 1
27-Jun-2021 00:35:18 19:36:37 68479
27-Jun-2021 19:36:37 19:36:38 1
27-Jun-2021 19:36:38 19:36:39 1
>>
So, that is a time frame during one day with missing frames; the Close time of the timestamp before matches that of the Open when recording picked up so within the recording resolution that's permissible.
Next, what about the negative Spans --
tData(find(tData.Span==-86399)+[-2:2],:)
ans =
25×4 table
Date Open Close Span
___________ ________ ________ ______
22-Jun-2021 23:59:57 23:59:57 0
22-Jun-2021 23:59:58 23:59:59 1
22-Jun-2021 23:59:59 00:00:00 -86399
23-Jun-2021 00:00:00 00:00:01 1
23-Jun-2021 00:00:02 00:00:03 1
23-Jun-2021 23:59:56 23:59:57 1
23-Jun-2021 23:59:57 23:59:58 1
23-Jun-2021 23:59:59 00:00:00 -86399
24-Jun-2021 00:00:00 00:00:01 1
24-Jun-2021 00:00:01 00:00:02 1
28-Jun-2021 23:59:56 23:59:57 1
28-Jun-2021 23:59:58 23:59:58 0
28-Jun-2021 23:59:59 00:00:00 -86399
29-Jun-2021 00:00:00 00:00:01 1
29-Jun-2021 00:00:02 00:00:02 0
29-Jun-2021 23:59:56 23:59:57 1
29-Jun-2021 23:59:58 23:59:59 1
29-Jun-2021 23:59:59 00:00:00 -86399
30-Jun-2021 00:00:00 00:00:01 1
30-Jun-2021 00:00:02 00:00:03 1
02-Jul-2021 23:59:57 23:59:57 0
02-Jul-2021 23:59:58 23:59:59 1
02-Jul-2021 23:59:59 00:00:00 -86399
03-Jul-2021 00:00:01 00:00:01 0
03-Jul-2021 00:00:02 00:00:03 1
>>
The issue w/ these is that your data recording doesn't have the facility to include the date change when the day rolls over at midnight so the Close time of the next morning is less than that of the midnight hour the night before.
If this turns out to be a 59:59 to 00:00 one-second interval, then you get the exact match of -86399, one second less than the number of seconds in a day. That's a pretty easy one to find and fix.
(*) NB: the above code will output stuff in a non-sequentially order manner; I rearranged by the timestamp chronologically so don't be surprised if you run the exact code when result isn't the same initially. Output the individual locations instead of the vectorized form to get them in order directly.
>> tData(find(tData.Span==-39549)+[-2:2],:)
ans =
5×4 table
Date Open Close Span
___________ ________ ________ ______
25-Jun-2021 20:33:10 20:33:11 1
25-Jun-2021 20:33:12 20:33:13 1
25-Jun-2021 20:33:13 09:34:04 -39549
26-Jun-2021 09:34:04 09:34:05 1
26-Jun-2021 09:34:05 09:34:06 1
>>
Here and the places similar that are negative Span but smaller are those that also have missing values but are not just the one-second rollover at midnight as above.
>> tMiss=(tData.Date(ix)+1+tData.Close(ix)) - (tData.Date(ix)+tData.Open(ix))
tMiss =
duration
13:00:51
>>
is the missing interval.
To interpolate into these missing values, with the data you have to date you're pretty limited in choices. I'd probably just use the observed frequency of
>> u=u(iswithin(u(:,1),0,15),:) % keep only those within reasonable range
u =
0 40665
1 291387
2 353
3 3
14 1
>> p=u(:,2)/sum(u(:,2))*100 % empirical distribution of span
p =
12.2334
87.6592
0.1062
0.0009
0.0003
>>
You would probably want to either fit an empirical distribution or arbitrarily fill in some guesstimated values for those spans between the 3-second observation and whatever you think is the maximum reasonable value.
As noted, above, however, it looks to me as though you need more precision in the time stamp and the recording should include both the date and the time for both Open and Close events. You could cut down the size of the log by reducing the format of the date stamp to something less verbose.
You need to make sure the collection computer is on a UPS so power glitches won't shut it down and also make sure any screensavers, powersavers, etc., are inactive so it doesn't power itself down. Those steps should greatly reduce if not entirely eliminate the issues going forward.
댓글 수: 2
dpb
2021년 7월 5일
편집: dpb
2021년 7월 5일
ix=find(tData.Span==-39549);
tMiss=(tData.Date(ix)+1+tData.Close(ix)) - (tData.Date(ix)+tData.Open(ix))
Pick whatever index you want to look at.
iswithin
>> type iswithin
function flg=iswithin(x,lo,hi)
% returns T for values within range of input
% SYNTAX:
% [log] = iswithin(x,lo,hi)
% returns T for x between lo and hi values, inclusive
flg= (x>=lo) & (x<=hi);
>>
is my utility wrapper function to encapsulate the two logical tests into a simple function reference for simplicity in the user code. I've used it for so long I often forget it's not a builtin MATLAB function.
추가 답변 (0개)
참고 항목
카테고리
Help Center 및 File Exchange에서 Timetables에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!