Translated by

이 페이지의 내용은 영어에서 자동 번역되었습니다. 자동 번역 기능을 해제하려면 "<a class="turn_off_mt" href="#">여기</a>"를 클릭하십시오.

Create a table with Revenue by financial year such as 2017-18

Asked by Philip Chindamo

Philip Chindamo (view profile)

on 3 Apr 2019
Latest activity Commented on by dpb

dpb (view profile)

on 12 Apr 2019
Accepted Answer by dpb

dpb (view profile)

From the excel data attached I’m trying to create a table with Revenue by financial year such as 2017-18 (where financial year is the sum of the quarters September 2017 + December 2017 + March 2018 + June 2018).
The table below shows what I need to get to from the raw data I attach. I tried writing the code attached but this sums the Revenue for all periods to give me a total of 8.8 for the Code category ‘abc’.
Do you know how I can do this table as below?
Code 2017-18 2018-19
abc 4 4.8
def 6.2 6.4
ghi 8 12

on 4 Apr 2019
Edited by dpb

dpb (view profile)

on 5 Apr 2019

t=readtable('chindamo.xlsx'); % read the input table data
t.Code=categorical(t.Code); % I just like categorical for such...
t.AcctYear=fix((t.Date-t.Date(1))/duration(24*365,0,0)); % compute index variable of accounting year
g=findgroups(t.AcctYear,t.Code); % get the grouping variable by year, code
fnRev=@(q,p) sum(q.*p); % define revenue summation functional
revenue=splitapply(fnRev,t.Quantity,t.Price,g); % and calculate for each group
revenue=reshape(revenue,numel(categories(t.Code)),[]); % orient by code for each year
returns the array of values
>> revenue=reshape(revenue,numel(categories(t.Code)),[])
revenue =
4.00 4.80
6.20 6.40
8.00 12.00
>>
from which you can build whatever structure you wish...
NB: the "trick" here is to compute the accounting year indicator variable as the integer portion of a duration of days. This relies on the data being ordered such that the first date of the desired year start month is the first element in the date vector as it simply is checking for the multiple of 365 days. It would take some fixup logic to deal with leap years.
The retime function with a timetable could be made to work by generating a specific new time vector; it's unfortunate that its builtin periods are based solely on a Jan1 - Dec31 definition of year start/end; it seems a logical extension to be able to define also the DOY that a fiscal year starts/ends; there certainly are sufficient uses for such. It's possible the Financial Toolbox might have such things; I don't have it and haven't searched...
The general solution for the duration vector is:
t.AcctYear=fix((t.Date-t.Date(1))./duration(24*365+isleapyr(tt),0,0);
where
function is=isleapyr(yr)
% returns T for input year being a leapyear
if isdatetime(yr), yr=year(yr); end
is=eomday(yr,2)==29;

dpb

dpb (view profile)

on 5 Apr 2019
You can submit thru the support link at top of page -- that doesn't imply TMW will write the code for you now, though, that simply gets the idea into their database of requests.
If you want a solution now, that would require hiring consultant services...I didn't see that you attached another data file for folks here to look at...
dpb

dpb (view profile)

on 5 Apr 2019
t.AcctYear=fix((t.Date-t.Date(1))/duration(length(t.Date)*365,0,0));
Works by happenstance as written; it just happens that have the right numerical value for the duration interval--for the non-leap-year case it should just be 8760 hours; just happens that there were 24 elements in the array.
Posted the general solution as addendum...it's actually not as difficult to build as seemed after had a little sleep overnight... :)
dpb

dpb (view profile)

on 5 Apr 2019
Oh! Another way to approach it that is a little less exotic in having to think about how to combine months across calendar years would be with a timetable object and using timerange to pick out the time subsets by range of months by year. These could then be used to build the AccountingYear corollary variable.
Would be a little more actual code and would want to write a little function to do it, but it would let it be written using higher-level abstraction of the dates as recognizable dates rather than the admittedly somewhat arcane conversion by elapsed time above.
Of course, if the above were wrapped in a function, its interface could be more user-friendly as well and the messy stuff, done once, hidden from view.

Peter Perkins (view profile)

Answer by Peter Perkins

Peter Perkins (view profile)

on 9 Apr 2019

I'm a little late to the party, but here's a version using rowfun and unstack. It doesn't require "balanced data", but maybe dpb's final version doesn't either I guess.
>> t = readtable('test.xlsx');
>> t.Code = categorical(t.Code); % probably want to enumerate the codes as cat names
>> edges = datetime(2017,9,1):calyears(1):datetime(2019,9,1); % or over whatever range you want
>> t.FinYear = discretize(t.Date,edges,'categorical',["y2017_2018" "y2018_2019"]);
>> revenue = rowfun(@(q,p) sum(q.*p), t, ...
'InputVariables',["Quantity" "Price"],'GroupingVariables',["FinYear" "Code"])
revenue =
6×4 table
FinYear Code GroupCount Var4
__________ ____ __________ ____
y2017_2018 abc 4 4
y2017_2018 def 4 6.2
y2017_2018 ghi 4 8
y2018_2019 abc 4 4.8
y2018_2019 def 4 6.4
y2018_2019 ghi 4 12
>> revenue = unstack(revenue,"Var4","FinYear",'GroupingVariables',"Code")
revenue =
3×3 table
Code y2017_2018 y2018_2019
____ __________ __________
abc 4 4.8
def 6.2 6.4
ghi 8 12

dpb

dpb (view profile)

on 9 Apr 2019
Nicely done, Peter, as per usual!
I really appreicate your jumping in on these datetime -related topics; we've battled over many a one, haven't we? :)
I'm still coming to grips with the duration -- that's a great use...and it's a little simpler to build the grouping indicator variable--it didn't strke me to build it outside rather than compute from the data--I'll have to remember that alternate route going forward.
As I commented though, it still seems to me that an option in retime for arbitrary start times would be nice; or somewhere there should be examples in the doc that are more like this instead of just syntax with easy inputs.
I don't recall having come across unstack before, though, either...
Steven Lord

Steven Lord (view profile)

on 9 Apr 2019
There's no way to redefine what the NEWTIMESTEP inputs like 'yearly' mean, but you can specify a NEWTIMES vector.
startTime = datetime('1 July 2015', 'InputFormat', 'd MMMM yyyy');
twentyFinancialYears = startTime + calyears(0:19).'
The "Interpolate Timetable Data to Time Vector" example on the retime documentation page shows specifying a specific vector of times, though the spacing there is 30 minutes rather than a year.
I used calyears instead of years in building twentyFinancialYears because otherwise the extra roughly quarter of a day and leap days would prevent the dates in twentyFinancialYears from landing exactly on midnight of July 1st in later years. Compare:
[startTime + years(0:19).', startTime + calyears(0:19).']
dpb

dpb (view profile)

on 9 Apr 2019
"no way to redefine what the NEWTIMESTEP inputs like 'yearly' mean, but you can specify a NEWTIMES vector..."
Yeah, that was one of my earlier comments on alternate approach.
OTOMH, doesn't seem like there would be any reason that one couldn't add that feature, though, of keeping the named divisions but simply provide them a reference start date.
Granted, it would be "syntactic sugar", but that's what the end purpose of Matlab is all about -- rapid development, not figuring out how to workaround things (even though we like ML mostly because it is fun to try to figure those things out, but we're not the majority of the users).

Philip Chindamo (view profile)

Answer by Philip Chindamo

Philip Chindamo (view profile)

on 10 Apr 2019

Hi everyone - thanks for the contributions and I'll take a look if it works for my data. However, more generally, why has it taken your time as 'experts' to help me on a most simple thing as starting a financial year on 1 July? Surely matlab should/could have this as a function? It's taken this long to wrangle the date field and I haven't even got to analysis yet. Is this the state of Matlab and data science in this day and age?

dpb

dpb (view profile)

on 10 Apr 2019
" why has it taken your time as 'experts' to help me on a most simple thing as starting a financial year on 1 July?"
Because nobody at TMW thought about adding it to the builtin retime function (nor as another option in the Financial TB, apparently).
Not that it shouldn't have it, but ML toolboxes in particular are still evolving. Also, ML is still fundamentally a programming language and so there's still the mindset that users have the basic tool and can make it do what they need/want and part of that is coding for special cases.
Again, that's not to say such enhancements shouldn't become available and oftentimes some things that seem fairly simple may have been overlooked...
Not being an econometrics kind of guy, are there alternate tools that have the functionality readily available?
Peter Perkins

Peter Perkins (view profile)

on 10 Apr 2019
To me, the financial calendar aspect of this, while perhaps not immediately obvious, was actually the simpler part of what needed doing. If you had wanted grouped calculations by calendar year, you'd have used t.Date.Year as a grouping variable -- no problem. So to do it with a Sep-Aug financial calendar, it was just a matter of instead binning by Sep-Aug. That sort of calendar calculation would be good in an example. Thanks for pointing that out.
But I hear what you are saying, that financial calendars ought to be more "out of the box".
To me, the more difficult part was the reorientation of the result from the year/code grouped means. unstack is not widely known, but it's similar in spirit to a pivoting operation, and people may be more familiar with that terminology, and we are trying to make stack/unstack more accessible in the doc.
dpb

dpb (view profile)

on 12 Apr 2019
Well, even as much "time in grade" as I have (altho I don't use ML full time nor am I actually consulting any longer, it's been 30 yr or so), the newer features of the time handling are still somewhat of a learning excercise to apply in most efficient manner--for the new user coming in from a non-programming or other background I can understand it being very daunting to come up with the "how" of binning just staring at the doc's for the first time.
While the documentation is good in most cases, a lot of the examples are just trivial to the point of not being particularly instructional; cases that do require some other manipulation that provide insight to how to extend the use would be valuable.
I didn't go to see when unstack was introduced, but I've read the splitapply and findgroups doc and all those examples many times as well as followed links form the table and retime and friends and I don't recall ever having across it before...maybe it's in that generic list of functions with table that I also can't ever find when I'm looking for it directly...