Remove non time string values in a time matrix

조회 수: 6 (최근 30일)
Ravi
Ravi 2019년 6월 1일
댓글: Steven Lord 2019년 6월 6일
Hi,
I have a time string matrix (592x1 cell) that looks something like this.(Time string values are outputs from a parsed serial port communication link).
time_mat = {'00:21:51.000',.........................'00:22:16.200','00:22:16.400','00:22:16.600','2019/05/30','00:22:17.000'....'22Rover6'.......,'2620517.2165',......................}
The bold ones are the ones that need to be removed and replaced with [].
I tried do a string comparison check and size matching criteria to remove the unnecessary data but it didn't work. Can anyone suggest a better approach? I have also attached the time_mat file for your perusal.
Thanks for your time and help.
Ravi

채택된 답변

Adam Danz
Adam Danz 2019년 6월 1일
편집: Adam Danz 2019년 6월 1일
I would use datetime() to convert your cell array of strings to a datetime array. This will return NaT (not a time) for elements that are not in the specified format.
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
Comparison
table(dtMat(110:115), time_mat(110:115),'VariableNames',{'datetime', 'original'})
ans =
6×2 table
datetime original
____________ ______________
00:22:16.400 '00:22:16.400'
00:22:16.600 '00:22:16.600'
NaT '2019/05/30'
00:22:17.000 '00:22:17.000'
00:22:17.200 '00:22:17.200'
00:22:17.800 '00:22:17.800'
To fill in the missing values with linear interpolation, use fillmissing() (r2016b or later)
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
dtMatFill = fillmissing(dtMat,'linear');
% To see the missing data
natIdx = isnat(dtMat); %index of missing data
dtMatFill(natIdx)
If you'd rather work with the cell array of strings, you can replace the bad elements with empties like this:
badIdx = cellfun(@isempty,regexp(time_mat,'\d{2}:\d{2}:\d{2}.\d{3}'));
time_mat(badIdx) = {[]};
  댓글 수: 6
Adam Danz
Adam Danz 2019년 6월 1일
@Ravi, on second thought, if you know the start time (time_mat(1)) and the sampling interval (0.2 sec), you could just produce the vector of time samples instead of reading them in .
% Convert your strings to datetime format
dtMat = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS', 'Format', 'HH:mm:ss.SSS');
% Fill in the NaT values
dtMatFill = fillmissing(dtMat,'linear');
% sample interval
sampInt = seconds(0.2);
% Total duration of series
totalDur = dtMatFill(end) - dtMatFill(1);
% Expected number of samples given total time and sample interval
nSamples = floor(totalDur/sampInt);
% produce time series
dtMatComplete = dtMatFill(1) + (1:nSamples)'*sampInt;
Ravi
Ravi 2019년 6월 5일
As I am reading the date, time and position values (5 every 1s) real-time, the start time is kind of arbitrary. Since I have a dynamic system, I should check for missing sample(s) in the data flow and interpolate to fill the vacant spots.
I was out testing so didn't get a chance to test it further but I was hoping the method you suggested works on missing position data as well. It works fine for completing the time vector (after a quick check).
Thanks for your time and help.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

dpb
dpb 2019년 6월 1일
편집: dpb 2019년 6월 1일
Use the datetime class is probably easiest...see if
tm=datetime(time_mat,'InputFormat','hh:mm:ss.SSS'); % convert to datetime; failures result in NaT
isnt=isnat(tm); % logical vector of those locations
>> time_mat(isnt) % the identified bum records...see if match expectations
ans =
11×1 cell array
{'2019/05/30' }
{'00:22Rover6' }
{'-2620517.2165'}
{'3.6' }
{'2019/05/30' }
{'0.1677' }
{'3954309.3750' }
{'2' }
{'2019/05/30' }
{'00Rover6' }
{'-4250201.7507'}
>> find(isnt) % the locations in the original vector
ans =
112
207
327
333
360
361
430
475
478
547
558
>>
ADDENDUM:
To fill in missing and otherwise clean up the transmission, something like the following:
tu=unique(tm); % there are some duplicated times
tt=timetable(tu,[1:numel(tu)].'); % build a time table from them
tt(isnat(tt.tu),:)=[]; % remove the NaT values to replace
ttnew=retime(tt,tt.tu(1):seconds(0.2):tt.tu(end),'linear'); % build a new table with interpolated values
There were two particular locations with same timestamp--
>> find(diff(t)==0)
ans =
45
139
>> t(40:50)
ans =
11×1 datetime array
...
12:22:00.2
12:22:00.4
12:22:00.4
12:22:00.8
...
What you do with those before you build the timetable I dunno--you could average them or select first/last ignoring the others as the above does...just depends on what's actually happening in your setup as to what you want to do, methinks...
After that, it's just make a new continuous time vector and interpolate -- the existing data will just be replaced with same, you can choose from alternate interpolating schemes as desired depending on the characteristics of the data you're collecting.
ADDENDUM 2:
You can make a more meaningful name for the time vector -- I was keeping separate variables for the original time and then the unique times, etc., so if I made a slip didn't have to go back more than one or two steps--so the tu got morphed into the table as the time variable name. You can fix this to more meaningful as
ttnew.Properties.DimensionNames(1)={'Time'};
for example. If do this before the retime then that's the variable name to use therein instead, of course.
  댓글 수: 3
dpb
dpb 2019년 6월 1일
See ammended answer...
Ravi
Ravi 2019년 6월 5일
@ dpb, thanks for your comments. I will explore the timeseries object. The issue is I am reading in date,time and position data from multiple sensors through an RF radio using a single COM port and even with the flow-control I see a lot of missing packets. (Which is usually the case with RF).
I will test your method and also follow Adams inputs to see if I can atleast read a continuous data stream on my end.
Thanks for your time and help.

댓글을 달려면 로그인하십시오.


Steven Lord
Steven Lord 2019년 6월 5일
These don't strike me as being datetime values, they're duration values. The same technique others have suggested (try converting them and look for missing values) will work with duration as worked with datetime. One benefit of converting to duration is that there's no date information added. From the datetime help: "If INFMT does not include a date portion, datetime assumes the current day. If INFMT does not include a time portion, datetime assumes midnight."
time_mat = {'00:21:51.000','00:22:16.200','00:22:16.400','00:22:16.600',...
'2019/05/30','00:22:17.000','22Rover6','2620517.2165'}
dt = datetime(time_mat, 'InputFormat', 'HH:mm:ss.SSS')
du = duration(time_mat)
Elements 5, 7, and 8 of both dt and du are missing and so can be identified using ismissing or removed with rmmissing.
ismissing(dt)
ismissing(du)
You could use either a datetime or a duration as the RowTimes in a timetable.
  댓글 수: 2
dpb
dpb 2019년 6월 5일
I have a hard time (so to speak! :) ) wrapping my head around a sampled timestamp being a duration, Steven. I grok it's the only way with the new classes one can have any time standing alone without an associated date, but it still just doesn't seem right nomenclature.
I've not gotten comfortable-enough as yet with the duration to be able to tell if there's something that doesn't agree with the use that way, but it never occurs to me naturally as yet to make use that way.
I really fail to see why a datetime can't have a void date portion other than it wasn't designed to allow for it...with the venerable datenum it was simple to just save only the fractional day.
Maybe eventually I'll come to grips with "the new normal", but as yet it's still a stretch... :)
Steven Lord
Steven Lord 2019년 6월 6일
A sampled timestamp is the amount of time that has elapsed since a certain basetime, right? The basetime could be the start of an experiment, the time a piece of hardware was turned on, or the start of a new day (midnight.) So the timestamp represents the duration of the experiment so far, the duration of the current run of that hardware, or the duration that's elapsed today.
datetime can answer the question "when?" while duration can answer the question "how long?" Upon rereading the original post, I can see that the data could be the answer to either of those questions. It could be thought of as representing when events occurred, it could also be thought of as representing how long after midnight (or the time the serial port became active) the events occurred. Since the expression in the data representing a date was unwanted, I interpreted it as the latter.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Dates and Time에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by