Multiple selection of data (dates,mon​ths,days,h​ours)

조회 수: 17 (최근 30일)
Sofia Fourkioti
Sofia Fourkioti 2020년 5월 4일
댓글: Sofia Fourkioti 2020년 5월 11일
Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;
Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0
2005-01-01 00:00:00.0/2005-01-01 01:00:00.0
2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]
I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.
Please Help !!!!
  댓글 수: 1
Sofia Fourkioti
Sofia Fourkioti 2020년 5월 4일
편집: Sofia Fourkioti 2020년 5월 4일
I forgot to mention that my data is from 2004-02-01 to 2020-01-31, including everyday hourly values . I want to group the values as specific year,specific month, specific day for the hour range of 0-23.
Thanks!

댓글을 달려면 로그인하십시오.

채택된 답변

Adam Danz
Adam Danz 2020년 5월 4일
편집: Adam Danz 2020년 5월 4일
Follow these steps to prepare in your data more efficiently and in a user-friendly format.
Use readtable to read in the data.
filePath = 'C:\Users\name\Documents\MATLAB'; % Path to your csv file
fileName = 'κοζάνη.csv';
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);
Let's look at T using head() which shows us the first few rows
>> head(T)
ans =
8×11 table
x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI ClearSkyDHI ClearSkyBNI GHI BHI DHI BNI Reliability
_______________________________________________ ______ ___________ ___________ ___________ ___________ ______ ______ ______ ______ ___________
{'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659 1.2983 16.531 1.789 0.4964 1.2926 14.457 1
{'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474 33.111 516.95 97.043 61.139 35.905 408.39 1
{'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68 56.764 799.67 268.38 203.61 64.77 710.04 1
Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!
Split the x_ObservationPeriod into start / stop times and convert to datetime format.
This will take a few seconds because you've got 140256 rows of data.
% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:}));
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl);
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'};
Let's look at startStopTblDt
>> head(startStopTblDt)
ans =
8×2 table
ObservationStart ObservationStop
____________________ ____________________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00
Much better format. You can update the output format using this example,
startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';
Combine the two tables
T = [startStopTblDt, T];
View the results (first 10 rows, 6 columns)
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
If you want to remove the x_ObservationPeriod column,
T.x_ObservationPeriod = [];
Add row numbers to the table that correspond to row number in csv file
rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);
View results
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
44 01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
45 01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
46 01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
47 01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
48 01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
49 01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
50 01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
51 01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
52 01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
53 01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
Now you can see that the first row of data comes from line 44 of the csv file.
  댓글 수: 12
Adam Danz
Adam Danz 2020년 5월 6일
편집: Adam Danz 2020년 5월 6일
"Do you think that it could be a better solution"
No, because I don't think groupsummary is better. Retime is more flexible and it's designed to work with timetables. With groupsummary, you would need to create a grouping variable for every type of average (yearly, monthly, etc). Retime does that for you.
You can follow examples for groupsummary in the documentation.
Example
>> T.month = month(T.ObservationStart);
>> groupsummary(T(:,{'TOA','month'}), 'month', 'mean')
% Result (first 3 rows)
% ans =
% 12×3 table
% month GroupCount mean_TOA
% _____ __________ ________
% 1 11904 175.52
% 2 10848 237.48
% 3 11904 321.15
But this requires you to add an additional column ('month') to an already large table and the data in that column is redundant.
Sofia Fourkioti
Sofia Fourkioti 2020년 5월 11일
Thank you for your help!!

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

dpb
dpb 2020년 5월 4일
Attach text that folks can do something with instead of pictures...or just attach the .csv file itself.
Use split to break the two times apart going into datetime
>> datetime(split(TSTR,'/').','InputFormat','uuuu-MM-dd''T''HH:mm:ss.S')
ans =
1×2 datetime array
01-Jan-2004 00:00:00 01-Jan-2004 01:00:00
>>
where I just copied your first line above into TSTR, split it into the two sections and returned that as the 2-column cell array to pass to datetime.
Once you have datetimes, put those into a table or timetable and use the selection on those datetime values.
  댓글 수: 1
Sofia Fourkioti
Sofia Fourkioti 2020년 5월 4일
편집: Sofia Fourkioti 2020년 5월 4일
Thank you!I have attached my .csv in the comments.
I have already split my data, but this is not my problem. Is there a way to locate this data and find their exact position in the .csv?

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Timetables에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by