describe
Description
describe( prints the description
of the features generated by Transformer)Transformer. Create the
FeatureTransformer object Transformer by using the
gencfeatures or
genrfeatures
function.
describe(
prints the description of the features identified by Transformer,Index)Index.
Examples
Generate features from a table of predictor data by using gencfeatures. Inspect the generated features by using the describe object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl) Region OutageTime Loss Customers RestorationTime Cause
_____________ ________________ ______ __________ ________________ ___________________
{'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' }
{'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' }
{'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'}
{'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' }
{'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' }
{'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' }
{'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'}
{'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime and RestorationTime, have data types that are not supported by classifier training functions like fitcensemble.
Generate 25 features from the predictors in Tbl that can be used to train a bagged ensemble. Specify the Region table variable as the response.
Transformer = gencfeatures(Tbl,"Region",25,TargetLearner="bag")
Transformer =
FeatureTransformer with properties:
Type: 'classification'
TargetLearner: 'bag'
NumEngineeredFeatures: 22
NumOriginalFeatures: 3
TotalNumFeatures: 25
The Transformer object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ _________________________________________________________________________________________________________________
Loss Numeric true Loss ""
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
sdn(OutageTime) Numeric false OutageTime "Serial date number from 01-Feb-2002 12:18:00"
woe3(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = SouthEast)"
doy(OutageTime) Numeric false OutageTime "Day of the year"
year(OutageTime) Numeric false OutageTime "Year"
kmd1 Numeric false Loss, Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd5 Numeric false Loss, Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
quarter(OutageTime) Numeric false OutageTime "Quarter of the year"
woe2(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Weight of Evidence (positive class = NorthEast)"
year(RestorationTime) Numeric false RestorationTime "Year"
month(OutageTime) Numeric false OutageTime "Month of the year"
Loss.*Customers Numeric false Loss, Customers "Loss .* Customers"
tods(OutageTime) Numeric false OutageTime "Time of the day in seconds"
⋮
The Info table indicates the following:
The first three generated features are original to
Tbl, although the software converts the originalCausevariable to a categorical variablec(Cause).The
OutageTimeandRestorationTimevariables are not included as generated features because they aredatetimevariables, which cannot be used to train a bagged ensemble model. However, the software derives many of the generated features from these variables, such as the fourth featureRestorationTime-OutageTime.Some generated features are a combination of multiple transformations. For example, the software generates the sixth feature
woe3(c(Cause))by converting theCausevariable to a categorical variable and then calculating the Weight of Evidence values for the resulting variable.
Generate features from a table of predictor data by using genrfeatures. Inspect the generated features by using the describe object function.
Read power outage data into the workspace as a table. Remove observations with missing values, and display the first few rows of the table.
outages = readtable("outages.csv");
Tbl = rmmissing(outages);
head(Tbl) Region OutageTime Loss Customers RestorationTime Cause
_____________ ________________ ______ __________ ________________ ___________________
{'SouthWest'} 2002-02-01 12:18 458.98 1.8202e+06 2002-02-07 16:50 {'winter storm' }
{'SouthEast'} 2003-02-07 21:15 289.4 1.4294e+05 2003-02-17 08:14 {'winter storm' }
{'West' } 2004-04-06 05:44 434.81 3.4037e+05 2004-04-06 06:10 {'equipment fault'}
{'MidWest' } 2002-03-16 06:18 186.44 2.1275e+05 2002-03-18 23:23 {'severe storm' }
{'West' } 2003-06-18 02:49 0 0 2003-06-18 10:54 {'attack' }
{'NorthEast'} 2003-07-16 16:23 239.93 49434 2003-07-17 01:12 {'fire' }
{'MidWest' } 2004-09-27 11:09 286.72 66104 2004-09-27 16:37 {'equipment fault'}
{'SouthEast'} 2004-09-05 17:48 73.387 36073 2004-09-05 20:46 {'equipment fault'}
Some of the variables, such as OutageTime and RestorationTime, have data types that are not supported by regression model training functions like fitrensemble.
Generate 25 features from the predictors in Tbl that can be used to train a bagged ensemble. Specify the Loss table variable as the response.
rng("default") % For reproducibility Transformer = genrfeatures(Tbl,"Loss",25,TargetLearner="bag")
Transformer =
FeatureTransformer with properties:
Type: 'regression'
TargetLearner: 'bag'
NumEngineeredFeatures: 22
NumOriginalFeatures: 3
TotalNumFeatures: 25
The Transformer object contains the information about the generated features and the transformations used to create them.
To better understand the generated features, use the describe object function.
Info = describe(Transformer)
Info=25×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ___________________________ ___________________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
kmd2 Numeric false Customers "Euclidean distance to centroid 2 (kmeans clustering with k = 10)"
kmd1 Numeric false Customers "Euclidean distance to centroid 1 (kmeans clustering with k = 10)"
kmd4 Numeric false Customers "Euclidean distance to centroid 4 (kmeans clustering with k = 10)"
kmd5 Numeric false Customers "Euclidean distance to centroid 5 (kmeans clustering with k = 10)"
kmd9 Numeric false Customers "Euclidean distance to centroid 9 (kmeans clustering with k = 10)"
cos(Customers) Numeric false Customers "cos( )"
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
kmd6 Numeric false Customers "Euclidean distance to centroid 6 (kmeans clustering with k = 10)"
kmi Categorical false Customers "Cluster index encoding (kmeans clustering with k = 10)"
kmd7 Numeric false Customers "Euclidean distance to centroid 7 (kmeans clustering with k = 10)"
kmd3 Numeric false Customers "Euclidean distance to centroid 3 (kmeans clustering with k = 10)"
kmd10 Numeric false Customers "Euclidean distance to centroid 10 (kmeans clustering with k = 10)"
hour(RestorationTime) Numeric false RestorationTime "Hour of the day"
⋮
The first three generated features are original to Tbl, although the software converts the original Region and Cause variables to categorical variables.
Info(1:3,:) % describe(Transformer,1:3)ans=3×4 table
Type IsOriginal InputVariables Transformations
___________ __________ ______________ ______________________________________________________________
c(Region) Categorical true Region "Variable of type categorical converted from a cell data type"
Customers Numeric true Customers ""
c(Cause) Categorical true Cause "Variable of type categorical converted from a cell data type"
The OutageTime and RestorationTime variables are not included as generated features because they are datetime variables, which cannot be used to train a bagged ensemble model. However, the software derives some generated features from these variables, such as the tenth feature RestorationTime-OutageTime.
Info(10,:) % describe(Transformer,10)ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ___________________________ ________________________________________________________________
RestorationTime-OutageTime Numeric false OutageTime, RestorationTime "Elapsed time in seconds between OutageTime and RestorationTime"
Some generated features are a combination of multiple transformations. For example, the software generates the nineteenth feature fenc(c(Cause)) by converting the Cause variable to a categorical variable with 10 categories and then calculating the frequency of the categories.
Info(19,:) % describe(Transformer,19)ans=1×4 table
Type IsOriginal InputVariables Transformations
_______ __________ ______________ ____________________________________________________________________________________________________________
fenc(c(Cause)) Numeric false Cause "Variable of type categorical converted from a cell data type -> Frequency encoding (number of levels = 10)"
Input Arguments
Feature transformer, specified as a FeatureTransformer object.
Features to describe, specified as a numeric or logical vector indicating the position of the features, or a string array or cell array of character vectors indicating the names of the features.
Example: 1:12
Data Types: single | double | logical | string | cell
Output Arguments
Feature descriptions, returned as a table. Each row corresponds to a generated feature, and each column provides the following information.
| Column Name | Description |
|---|---|
Type | Indicates the data type of the feature, either numeric
or categorical
|
IsOriginal | Indicates whether the feature is an original feature
(true) or an engineered feature
(false) |
InputVariables | Indicates the original features used to generate the feature |
Transformations | Describes the transformations used to generate the feature, in the order they are applied — For more information, see Feature Transformations. |
Algorithms
This table provides additional information on some of the more complex feature
transformation descriptions in Info.Transformations.
| Sample Feature Name | Sample Transformation Description in Info | Additional Information |
|---|---|---|
eb4(Variable) | Equal-width binning (number of bins = 4) | The software splits the Variable values into
4 bins of equal width. The resulting feature is a categorical
variable. |
fenc(Variable) | Frequency encoding (number of levels = 10) | The software calculates the frequency of the 10 categories
(or levels) in Variable. In the resulting feature, the software
replaces each categorical value with the corresponding category frequency,
creating a numeric variable. |
kmc1 | Centroid encoding (component #1) (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature corresponds to an observation and is the 1st component
of the cluster centroid associated with that observation. The resulting feature is
a numeric variable. |
kmd4 | Euclidean distance to centroid 4 (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the Euclidean distance from the corresponding observation to the
centroid of the 4th cluster. The resulting feature is a numeric
variable. |
kmi | Cluster index encoding (kmeans clustering with k =
10) | The software uses k-means clustering to assign each
observation to one of 10 clusters. Each row in the resulting
feature is the cluster index for the corresponding observation. The resulting
feature is a categorical variable. |
q50(Variable) | Equiprobable binning (number of bins = 50) | The software splits the Variable values into
50 bins of equal probability. The resulting feature is a
categorical variable. |
woe5(Variable) | Weight of Evidence (positive class = Class5) | This transformation is available for classification problems only. The software performs the following steps to create the resulting feature:
|
Version History
Introduced in R2021a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)