Split a target date interval into seasons and find the percentile of days for each season

조회 수: 11 (최근 30일)
I have the following seasons:
  • Spring: {01-March , 31-May}
  • Summer: {01-June, 31-August}
  • Autumn: {01-Sep, 30-Nov}
  • Winter: {01-Dec, 28-Feb}
and I also have as a data a reference period, e.g. 01-Jan until 15-Sep (254 days).
What I want is to compute what is the percentile of the days of reference period that belong to each season.
For example, in the present case, we have:
  • Winter: 01-Jan until 28-Feb --> 59 days / 254 days = 23.5%
  • Spring: 01-Mar until 31-May --> 90 days / 254 days = 35%
  • Summer: 01-Jun until 31-Aug --> 90 days / 254 days = 35%
  • Autumn: 01-Sep until 15-Sep --> 15 days /254 days = 6.5%
So the requested values would be {23.5, 35, 35, 6.5}.

채택된 답변

Scott MacKenzie
Scott MacKenzie 2021년 7월 11일
There might be a way to shorten this, but I think it achieves what you are after. You didn't mention the year, so I set this up as a variable (change, as necessary). This script accommodates leap years. BTW, the number of reference days below is 259. Not sure why this differs from your count of 254.
yr = 2020; % change, as necessary
% reference days of interest
r1= datetime(yr,1,1):datetime(yr,2,eomday(yr,2));
r2= datetime(yr,3,1):datetime(yr,5,31);
r3= datetime(yr,6,1):datetime(yr,8,31);
r4= datetime(yr,9,1):datetime(yr,9,15);
n = length([r1 r2 r3 r4]) % number of reference days
n = 259
percentDays = [length(r1), length(r2), length(r3), length(r4)] /n * 100
percentDays = 1×4
23.1660 35.5212 35.5212 5.7915
  댓글 수: 4
Scott MacKenzie
Scott MacKenzie 2021년 7월 11일
@DIMITRIS GEORGIADIS You're welcome. Glad to help. Concerning your first question, you can't really get rid of the yr variable because of leap years. If don't care about Feb 29, then sure. Just substitute any year (e.g., 1) for yr in the script.
In terms of generalizing this, sure that can be done. You could write a function that receives as input the month and day of the begining of each reference period. You'd also need the month and day of end of the last reference period. The function would return the percentages, as in my example script. To shorten the code and allow for any number of reference periods (in case you somtimes need >4), you could use structures or arrays to hold the month+day inputs and the percentages output.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

Seth Furman
Seth Furman 2021년 7월 14일
To add to Scott's answer, this kind of grouped calculation on timestamped data lends itself well to timetable and groupsummary.
isWinter = @(dt) dt.Month == 12 | dt.Month <= 2;
isSpring = @(dt) 3 <= dt.Month & dt.Month <= 5;
isSummer = @(dt) 6 <= dt.Month & dt.Month <= 8;
isAutumn = @(dt) 9 <= dt.Month & dt.Month <= 11;
referencePeriod = timetable('RowTimes',datetime(2020,1,1):caldays(1):datetime(2020,9,15));
referencePeriod.Season(isWinter(referencePeriod.Time)) = categorical("Winter");
referencePeriod.Season(isSpring(referencePeriod.Time)) = categorical("Spring");
referencePeriod.Season(isSummer(referencePeriod.Time)) = categorical("Summer");
referencePeriod.Season(isAutumn(referencePeriod.Time)) = categorical("Autumn");
counts = groupsummary(referencePeriod,"Season")
counts = 4×2 table
Season GroupCount ______ __________ Winter 60 Spring 92 Summer 92 Autumn 15
counts.Percentages = counts.GroupCount ./ sum(counts.GroupCount)
counts = 4×3 table
Season GroupCount Percentages ______ __________ ___________ Winter 60 0.23166 Spring 92 0.35521 Summer 92 0.35521 Autumn 15 0.057915
  댓글 수: 2
DIMITRIS GEORGIADIS
DIMITRIS GEORGIADIS 2021년 7월 16일
Dear @Seth Furman thank you very much for your very interesting and elegant answer!
Peter Perkins
Peter Perkins 2022년 4월 11일
The general idea in Seth's suggestion is a good one, and his solution works for multiple years of data. If the data are always within one year, discretize would make this even simpler. Obviously winter crosses a year boundary, but it's not hard to use discretize in a way that handles that:
dt = datetime(2022,1,1):datetime(2022,9,15);
edges = datetime(2022,[1,3,6,9,12,13],[1,1,1,1,1,1]); % winter at both ends
season = discretize(dt,edges,"categorical",["Winter" "Spring" "Summer" "Autumn" "Winter"])
season = 1×258 categorical array
Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Winter Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Spring Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Summer Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn Autumn
summary(season)
Winter Spring Summer Autumn 59 92 92 15
Using groupsummary gives you a nice output table; creating season would work as a lead-up to that too.

댓글을 달려면 로그인하십시오.


Peter Perkins
Peter Perkins 2021년 7월 27일
Another possibility:
>> yr = 2020;
>> edges = datetime(yr,[1 3 6 9 12,12],[1 1 1 1 1 32])
edges =
1×6 datetime array
01-Jan-2020 01-Mar-2020 01-Jun-2020 01-Sep-2020 01-Dec-2020 01-Jan-2021
>> t = datetime(yr,1,1:259)';
>> tf = isbetween(t,edges(1:end-1),edges(2:end),'openright'); % uses implicit expansion
>> tf(:,1) = tf(:,1) + tf(:,end);
>> tf(:,end) = [];
>> 100*sum(tf,1)/length(t)
ans =
23.166 35.521 35.521 5.7915

카테고리

Help CenterFile Exchange에서 Data Distribution Plots에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by