Multiple selection of data (dates,months,days,hours)

Question

Sofia Fourkioti 2020년 5월 4일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/522864-multiple-selection-of-data-dates-months-days-hours

댓글: Sofia Fourkioti 2020년 5월 11일

Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;

Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0

2005-01-01 00:00:00.0/2005-01-01 01:00:00.0

2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]

I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.

Please Help !!!!

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Sofia Fourkioti 2020년 5월 4일

편집: Sofia Fourkioti 2020년 5월 4일

I forgot to mention that my data is from 2004-02-01 to 2020-01-31, including everyday hourly values . I want to group the values as specific year,specific month, specific day for the hour range of 0-23.

Thanks!

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Adam Danz 2020년 5월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/522864-multiple-selection-of-data-dates-months-days-hours#answer_430183

편집: Adam Danz 2020년 5월 4일

Follow these steps to prepare in your data more efficiently and in a user-friendly format.

Use readtable to read in the data.

filePath = 'C:\Users\name\Documents\MATLAB';  % Path to your csv file
fileName = 'κοζάνη.csv'; 
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);

Let's look at T using head() which shows us the first few rows

>> head(T)
ans =
  8×11 table
                  x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI    ClearSkyDHI    ClearSkyBNI     GHI       BHI       DHI       BNI      Reliability
    _______________________________________________    ______    ___________    ___________    ___________    ___________    ______    ______    ______    ______    ___________
    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659         1.2983         16.531        1.789    0.4964    1.2926    14.457         1     
    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474         33.111         516.95       97.043    61.139    35.905    408.39         1     
    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68         56.764         799.67       268.38    203.61     64.77    710.04         1 

Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!

Split the x_ObservationPeriod into start / stop times and convert to datetime format.

This will take a few seconds because you've got 140256 rows of data.

% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:})); 
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl); 
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'}; 

Let's look at startStopTblDt

>> head(startStopTblDt)
ans =
  8×2 table
      ObservationStart        ObservationStop   
    ____________________    ____________________
    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00
    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00
    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00
    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00
    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00
    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00
    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00
    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00

Much better format. You can update the output format using this example,

startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';

Combine the two tables

T = [startStopTblDt, T];

View the results (first 10 rows, 6 columns)

>> T(1:10,1:6)
ans =
  10×6 table
      ObservationStart        ObservationStop                     x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI
    ____________________    ____________________    _______________________________________________    ______    ___________    ___________
    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0   
    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0   
    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0   
    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0   
    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0   
    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659   
    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474   
    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68   
    01-Feb-2004 08:00:00    01-Feb-2004 09:00:00    {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'}    574.58      440.45          372.5   
    01-Feb-2004 09:00:00    01-Feb-2004 10:00:00    {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'}    693.92      546.61          472.8   

If you want to remove the x_ObservationPeriod column,

T.x_ObservationPeriod = [];

Add row numbers to the table that correspond to row number in csv file

rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);

View results

>> T(1:10,1:6)
ans =
  10×6 table
            ObservationStart        ObservationStop                     x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI
          ____________________    ____________________    _______________________________________________    ______    ___________    ___________
    44    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0   
    45    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0   
    46    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0   
    47    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0   
    48    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0   
    49    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659   
    50    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474   
    51    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68   
    52    01-Feb-2004 08:00:00    01-Feb-2004 09:00:00    {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'}    574.58      440.45          372.5   
    53    01-Feb-2004 09:00:00    01-Feb-2004 10:00:00    {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'}    693.92      546.61          472.8   

Now you can see that the first row of data comes from line 44 of the csv file.

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

Adam Danz 2020년 5월 5일

편집: Adam Danz 2020년 5월 5일

It sounds like retime()is what you're looking for. This will compute means over years, months, days, whatever. But first you need to convert your table to a timetable which is easy.

% Create a timetable 
% Choose which row-time variable to use ObservationStart or ObservationStop
% and inclue any numerical variables you wish to average. 
TT = table2timetable([T(:,'ObservationStart'), T(:,{'TOA','ClearSkyGHI'})]);

If your table T had row names, remove them from the timetable,

TT.Row = [];

Compute monthly, yearly, etc, means,

mu = retime(TT, 'monthly', 'mean');
mu = retime(TT, 'yearly', 'mean');

See all options here: https://www.mathworks.com/help/matlab/ref/timetable.retime.html#bvdhiwr-newTimeStep

Results for monthly means (first few rows)

head(mu)
ans =
  8×2 timetable
    ObservationStart     TOA      ClearSkyGHI
    ________________    ______    ___________
    01-Feb-2004         237.34      173.27   
    01-Mar-2004         322.04      235.79   
    01-Apr-2004         402.99       294.1   
    01-May-2004         459.16      341.42   
    01-Jun-2004         481.67      355.41   
    01-Jul-2004         467.79      346.46   
    01-Aug-2004          419.7      309.82   
    01-Sep-2004         346.55      250.19   

Adam Danz 2020년 5월 6일

편집: Adam Danz 2020년 5월 6일

"Do you think that it could be a better solution"

No, because I don't think groupsummary is better. Retime is more flexible and it's designed to work with timetables. With groupsummary, you would need to create a grouping variable for every type of average (yearly, monthly, etc). Retime does that for you.

You can follow examples for groupsummary in the documentation.

https://www.mathworks.com/help/matlab/ref/double.groupsummary.html

Example

>> T.month = month(T.ObservationStart);
>> groupsummary(T(:,{'TOA','month'}), 'month', 'mean')
% Result (first 3 rows)
% ans =
%   12×3 table
%     month    GroupCount    mean_TOA
%     _____    __________    ________
%       1        11904        175.52 
%       2        10848        237.48 
%       3        11904        321.15 

But this requires you to add an additional column ('month') to an already large table and the data in that column is redundant.

Sofia Fourkioti 2020년 5월 11일

Thank you for your help!!

댓글을 달려면 로그인하십시오.

Answer 2

dpb 2020년 5월 4일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/522864-multiple-selection-of-data-dates-months-days-hours#answer_430153

Attach text that folks can do something with instead of pictures...or just attach the .csv file itself.

Use split to break the two times apart going into datetime

>> datetime(split(TSTR,'/').','InputFormat','uuuu-MM-dd''T''HH:mm:ss.S')
ans = 
  1×2 datetime array
   01-Jan-2004 00:00:00   01-Jan-2004 01:00:00
>> 

where I just copied your first line above into TSTR, split it into the two sections and returned that as the 2-column cell array to pass to datetime.

Once you have datetimes, put those into a table or timetable and use the selection on those datetime values.

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

Sofia Fourkioti 2020년 5월 4일

편집: Sofia Fourkioti 2020년 5월 4일

Thank you!I have attached my .csv in the comments.

I have already split my data, but this is not my problem. Is there a way to locate this data and find their exact position in the .csv?

댓글을 달려면 로그인하십시오.

Multiple selection of data (dates,months,days,hours)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

추가 답변 (1개)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Multiple selection of data (dates,mon​ths,days,h​ours)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

채택된 답변

댓글 수: 12 이전 댓글 10개 표시이전 댓글 10개 숨기기

추가 답변 (1개)

댓글 수: 1 이전 댓글 -1개 표시이전 댓글 -1개 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

Multiple selection of data (dates,months,days,hours)

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기

댓글 수: 12
이전 댓글 10개 표시이전 댓글 10개 숨기기

댓글 수: 1
이전 댓글 -1개 표시이전 댓글 -1개 숨기기