How to code Categorical Variables in NARX neural network data input?
이전 댓글 표시
I am working to predict electricity demand (load) and I am having many categorical variables as inputs to a Neural Network Time Series NARX app (eg: months (12 categories spelled out January -December), days (seven categories: 1 - 7), and Hours in each day (1 thru 24). When I load my excel data table to assign "Inputs" as my variables, the Matlab is not able to read and display my categorical variable "Months" because the values are spelled out January thru December. Should I write a simple line code such as below, or is there a different way to flag those variables as Categorical for NARX neural networks? I prefer not to convert Months into 1-12 as Matlab will assume some scale (Month 12 is higher than Month 6, etc). Thank you in advance!
T.HE = categorical(T.HE); T.MONTH = categorical(T.MONTH);T.WEEKDAY = categorical(T.WEEKDAY);
댓글 수: 3
awezmm
2020년 1월 3일
What is the error you are getting when you say "the Matlab is not able to read and display my categorical variable"
SK
2020년 1월 3일
Walter Roberson
2020년 1월 3일
You will not be able to proceed with the Mathworks tools and will need to write your own. The Mathworks tools can only work with data that is all (orderable) numeric, or all categorical, or all cell array of character vectors.
Even if you were to switch to all categorical you would have challenges: when you concatenate together categorical arrays, the individual ranges loose their identity and a new categorical array is created that combines all of the categories, renumbering elements. The neural networks would have no way of knowing that the second column could not simultaneously have Tuesday and March for example.
However as I touched on in my Answer, I think you are making a mistake in trying to make the entries unordered. When you make them unordered you are saying that the second day of February has more predictive power for load on the second day of August than the first day of August has for the second day of August.
채택된 답변
추가 답변 (1개)
SK
2020년 1월 3일
1 개 추천
댓글 수: 4
Walter Roberson
2020년 1월 3일
편집: Walter Roberson
2020년 1월 3일
you are recommending to recode ALL my categorical variables into zeros and ones (binary) by creating additional 12 Columns for months, 24 columns for hours and 7 columns for days of the week
If you do that, then the result is double() datatype, and it is valid to combine that with numeric data such as temperature and humidity in the same array. It is valid to use
[isJanuary, isFebruary, isMarch, isApril, isMay, isJune, isJuly, isAugust, isSeptember, isOctober, isNovember, isDecember, isMonday, isTuesday, isWednesday, isThursday, isFriday, isSaturday, isSunday, is0000, is0100, is0200, is0300, is0400, is0500, is0600, is0700, is0800, is0900, is1000, is1100, is1200, is1300, is1400, is1500, is1600, is1700, is1800, is1900, is2000, is2100, is2200, is2300, TemperatureC, RelativeHumidity, SolarIntensity]
whereas if you tried to use
[monthCategory, WeekdayCategory, HourCategory, TemperatureC, RelativeHumidity, SolarIntensity]
then that would fail because you cannot combine categorical and double precision in the same array.
However, my belief is that you will get further if you code as
[monthNumber, WeekdayNumber, HourNumber, TemperatureC, RelativeHumidity, SolarIntensity]
SK
2020년 1월 3일
Walter Roberson
2020년 1월 4일
Yes, that makes sense. Version 2 corresponds to using unordered categories, and Version 1 corresponds to using ordered categories.
SK
2020년 1월 4일
카테고리
도움말 센터 및 File Exchange에서 Deep Learning Toolbox에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
