Computing average absolute time from date strings without years
이전 댓글 표시
Hi there, I have a very large table with a column of (12 hr) times saved as strings in a cell array, and a column of folder names also saved as strings in a cell array. So for example, say my table was called 'data', then if I type:
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
I can access a string directly by typing:
K>> data.folder{1}
ans =
'160115/f1/f2/f3'
The first part of the folder string has the date in YYMMDD format. Another column in the table has the time in a string (12 hour clock):
K>> data.end_time{1}
ans =
'11:24:17 AM'
The actual table is enormous. I have some index for each row which fits into some supercategory (eg taken on some date), and I want to average the absolute time for each row in that category. How can I do this?
I have a brute force solution for the first part which works but I don't like how it's a loop. I would prefer to do it in a more compact way. Anyways, here it is:
exact_times = datetime(ramsey_data.end_time);
date = ramsey_data.folder;
for i=1:numel(exact_times) ymd = date{i};
ymd = ymd(1:6);
year = ['20' ymd(1:2)];
month = ymd(3:4);
day = ymd(5:6);
exact_times(i).Year = str2num(year);
exact_times(i).Month = str2num(month);
exact_times(i).Day = str2num(day);
end
I still need to find a way to average the resulting values in exact_times properly. I want to select only some of the rows in the table to average. If I could use accumarray I would use something like: average_time = accumarray(subs, exact_times, [], @mean); but I can't use that on datetime variables
댓글 수: 4
dpb
2018년 3월 21일
"I want to average the absolute time for each row in that category."
I'm not sure I understand what this means...can you show an example of some inputs and the desired output for the same input?
Also, it would help immensely if you would attach a (small but) representative part of the dataset that folks can play with.
It strikes me that probably the thing would be to convert the time to durations, but need more problem definition to be certain. Alternatively, parse the date from field 1 and combine with time from field 2 to get the absolute date/time for each entry.
Rearranging the input to not have combined data in the fields could make the job much simpler if you have any control at all on how this is created.
dpb
2018년 3월 21일
What's wrong with the posted solution? Just turn whatever is the selection criterion into a logical addressing vector. ismember or whatever works on datetime.
채택된 답변
추가 답변 (2개)
t=readtable('example.csv'); % read the data
t.folder=char(t.folder); % to char array
t.date=datetime(t.folder(:,1:6),'inputformat','yyMMdd'); % convert conglomeration to time
[~,~,~,h,m,s]=datevec(t.end_time); % get wanted h,m,s
[Y,M,D]=datevec(t.end_time); % and corresponding date
dt=datetime(Y,M,D,h,m,s) % build the composite date/time
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>> mean(dt)
ans =
datetime
15-Jan-2016 11:24:32
>>
Could also parse the string data first and read the time as string and piece them together as text input to datetime. Will leave as "exercise for the student" :)
ADDENDUM
Per sidebar comments using your list of G:
>> splitapply(@mean,dt,G)
ans =
4×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:09
>>
ADDENDUM 2
If the issue you're having is the nested cell array; there's another path besides that above avoiding the intermediary of the table --
>> c=textscan(fid,'%s%s','delimiter',',','headerlines',1);
>> times=char(c{1}); folder=char(c{2});
>> dt=datetime([folder(:,1:6) blanks(10).' times],'inputformat','yyMMdd hh:mm:ss a')
dt =
10×1 datetime array
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:24:17
15-Jan-2016 11:25:36
15-Jan-2016 11:25:36
>>
splitapply with the grouping vector as above from here.
Unfortunately, Matlab doesn't have the syntax to be able to use sub-indexing of doubly-nested cellstr arrays without the intermediary variable to dereference them to singly-nested as above; there's probably the place where you're having addressing issues. (Of course, that I inadvertently deleted the conversion step in initial posting didn't help...sorry about that :P)
You don't show how you're getting the data into the app; as above textscan has the penchant to put everything into that two-level structure that is convenient for variables returned from it being succinct but not necessarily so much when comes to using the results.
댓글 수: 6
Use grouping variables and splitapply...I thought the problem was the combination of year with time.
You didn't explain how to get which given rows you want to average; the sample vector above seems totally random given the list of times. If that vector is generated by some other logic, then just use it...see amplified Answer on that assumption.
supernoob
2018년 3월 21일
So, does not the amplified Answer then solve all the issues? BTW, you seem to misinterpret what t is; it's the table object created from the .csv file; folder is the auto-generated field name based on the file header content...
>> type example.csv % the file content used...
end_time,folder
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:24:17 AM,160115/baseline1/block1/ramsey1
11:25:36 AM,160115/baseline1/block1/ramsey1
11:25:36 AM,160115/baseline1/block1/ramsey1
>>
>> t=readtable('example.csv') % read as table...
t =
end_time folder
___________ _________________________________
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:24:17 AM '160115/baseline1/block1/ramsey1'
11:25:36 AM '160115/baseline1/block1/ramsey1'
11:25:36 AM '160115/baseline1/block1/ramsey1'
>>
Everything in the Answer proceeds directly from there; no magic needed; any release from R2013b on is adequate.
supernoob
2018년 3월 22일
Oh, I see when I pasted from workspace and cleaned up the superfluous outputs I inadvertently deleted one line..converting the cell array to char()...
t.folder=char(t.folder);
before the
t.date=...
step....
supernoob
2018년 3월 21일
댓글 수: 1
What happened to the time strings; this only looks at the dates?
Going back to your original where you show you have
K>> data.folder
ans =
1150×1 cell array
{'160115/f1/f2/f3'}
K>> data.end_time{1}
ans =
'11:24:17 AM'
then
folder=char(data.folder{:});
times=char(data.end_time{:});
would get you to the point at which I started after reading the csv file into the table I named t excepting w/ standalone variables as in the latter example instead of inside the table.
카테고리
도움말 센터 및 File Exchange에서 Calendar에 대해 자세히 알아보기
제품
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!



