60 years of daily data; need to group separately by month and year
조회 수: 10 (최근 30일)
이전 댓글 표시
I am completely new to MATLAB and have about 60 years of daily data (22246 rows) and four columns of interest.
I've imported the data into MATLAB as a table from a .csv.
I want to start by grouping this into months so that I can then analyse averages, minimums, maximums, general trends, plot graphs etc. but I don't want to add up all monthly data into a single value for each month for each column. Instead, I want to see a single value for each month of every year (so condensing this into about 720 rows), not 12 rows where all January data is aggregated and then all February etc.
I.e.
J F M A M J J A S O N D
2000
2001
2002
...
I know that this cannot go into the original table and that a new one will need to be created.
[Later on, I want to do this by year but once I know how to do it by month, I should be able to amend the code accordingly.]
Another problem is that the data starts mid-year and ends mid-year (i.e. start date is not 1st Jan and end date is not 31st Dec), although the data does run consecutively without a break once it starts.
The original dates are in the form DD-MMM-YYYY.
I already know how to do the analysis (e.g. averages) on the data but my problem is getting to a starting point in terms of the values that I want to analyse.
I've tried searching forums already and can't find the answer or at least one that my limited knowledge of MATLAB can understand. Please help as I'm tearing my hair out!
Thank you!
댓글 수: 3
Stephen23
2023년 2월 26일
편집: Stephen23
2023년 2월 26일
Do NOT solve this with loops, lots of UNIQUE calls, padding arrays with ZEROS and the like.
Learn to use MATLAB instead of fighting it. The most important thing is to get the data design right, which does take some practice. But in many cases, a good place to start is to import data as a table:
You can solve this task in just a few table commands: the more you browse the documentation, the more you will find and know what is available to help you solve your problems:
채택된 답변
Star Strider
2023년 2월 26일
One approach for the mean values —
T1 = readtable('https://www.mathworks.com/matlabcentral/answers/uploaded_files/1307475/Flows.csv')
TT1 = table2timetable(T1)
TT1m = retime(TT1, 'monthly', 'mean') % Aggregate On 'mean' Values
VN = T1.Properties.VariableNames;
T1maDate1 = datetime('01-Jan-1958') + calmonths(0:8).'; % Pad First Nine Months
T1ma1 = zeros(size(T1maDate1,1),4);
T1ma1 = [table(T1maDate1) array2table(T1ma1)];
T1ma1.Properties.VariableNames = VN;
T1maDate2 = TT1m.Date(end) + calmonths(1:4).'; % PAd Last Four Months
T1ma2 = zeros(size(T1maDate2,1),4);
T1ma2 = [table(T1maDate2) array2table(T1ma2)];
T1ma2.Properties.VariableNames = VN;
TT1m = [table2timetable(T1ma1); TT1m; table2timetable(T1ma2)] % Pad Array To Fill Out Months
Monthsc = unique(month(TT1m.Date, 'shortname'),'stable');
Yearsc = unique(year(TT1m.Date),'stable');
for k = 1:size(TT1m,2)
T1var{k,1} = array2table(reshape(TT1m{:,k}, 12, []).', 'RowNames',string(Yearsc), 'VariableNames',Monthsc);
T1var{k,2} = VN{k+1};
end
T1var{1,:}
Repeat this for the other statistics.
.
댓글 수: 2
Star Strider
2023년 2월 26일
As always, my pleasure!
I’ve had some experience with this sort of problem recently with another thread, so I know that there do not appear to be existing functions that can do this sort of operation. (It would be nice if there were!) There might be other ways of preallocating the table to avoid padding both ends of it, however that was not immediately obvious either. The matrix approach seems to work best here.
.
추가 답변 (2개)
Image Analyst
2023년 2월 26일
First see if you can do it yourself using one of these functions: splitapply, grpstats, groupsummary
If you still can't figure it out, let us know if you have the stats toolbox, and someone will do it for you.
Stephen23
2023년 2월 26일
편집: Stephen23
2023년 2월 27일
The simple MATLAB approach using GROUPSUMMARY and UNSTACK:
M = categorical(datetime(1,1,1,"Format","MMM"):calmonths(1):datetime(1,12,31));
T = readtable('Flows.csv');
T.Year = T.Date.Year;
T.Month = M(T.Date.Month).';
S = groupsummary(T,["Year","Month"],"mean","TotalDailyFlowRate_m3s");
U = unstack(S,"mean_TotalDailyFlowRate_m3s","Month", "GroupingVariables","Year")
댓글 수: 4
Stephen23
2023년 2월 27일
편집: Stephen23
2023년 2월 27일
@Star Strider: thank you! For some reason I didn't think of a non-scalar input to CALMONTHS.
M = categorical(datetime(1,1,1,"Format","MMM")+calmonths(0:11))
sort(M) % sorts into category order (not alphabetic order)
참고 항목
카테고리
Help Center 및 File Exchange에서 Logical에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!