screenpredictors
Screen credit scorecard predictors for predictive value
Description
returns the output variable, metric_table
= screenpredictors(data
)metric_table
, a MATLAB® table containing the calculated values for several measures of
predictive power for each predictor variable in the data
.
Use the screenpredictors
function as a preprocessing step
in the Credit Scorecard Modeling Workflow to
reduce the number of predictor variables before you create the credit scorecard
using the creditscorecard
function from
Financial Toolbox™. In addition, you can use Threshold
Predictors from Risk Management Toolbox™to interactively set credit scorecard predictor thresholds using the
output from screenpredictors
before you create the credit scorecard using the creditscorecard
.
specifies options using one or more name-value pair arguments in addition to the
input arguments in the previous syntax. metric_table
= screenpredictors(___,Name,Value
)
Examples
Reduce the number of predictor variables by screening predictors before you create a credit scorecard.
Use the CreditCardData.mat
file to load the data (using a dataset from Refaat 2011).
load CreditCardData.mat
Define 'IDVar'
and 'ResponseVar'
.
idvar = 'CustID'; responsevar = 'status';
Use screenpredictors
to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data.
metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
metric_table = sortrows(metric_table,'AccuracyRatio','descend')
metric_table=9×7 table
InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing
_________ _____________ _______ _______ _______ __________ ______________
CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0
CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0
TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0
EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0
AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0
ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0
OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0
UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0
TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0
Based on the AccuracyRatio
metric, select the top predictors to use when you create the creditscorecard
object.
varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)
varlist = 4×1 cell
{'CustIncome'}
{'CustAge' }
{'TmWBank' }
{'EmpStatus' }
Use creditscorecard
to create a createscorecard
object based on only the "screened" predictors.
sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)
sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {'CustID' 'CustAge' 'TmAtAddress' 'ResStatus' 'EmpStatus' 'CustIncome' 'TmWBank' 'OtherCC' 'AMBalance' 'UtilRate' 'status'} NumericPredictors: {'CustAge' 'CustIncome' 'TmWBank'} CategoricalPredictors: {'EmpStatus'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {'CustAge' 'EmpStatus' 'CustIncome' 'TmWBank'} Data: [1200×11 table]
Input Arguments
Data for the creditscorecard
object, specified as a
MATLAB table, tall table, or tall timetable, where each column of
data can be any one of the following data types:
Numeric
Logical
Cell array of character vectors
Character array
Categorical
String
Data Types: table
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN
, where Name
is
the argument name and Value
is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Before R2021a, use commas to separate each name and value, and enclose
Name
in quotes.
Example: metric_table =
screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})
Name of identifier variable, specified as the comma-separated pair
consisting of 'IDVar'
and a case-sensitive
character vector. The 'IDVar'
data can be ordinal
numbers or Social Security numbers. By specifying
'IDVar'
, you can omit the identifier variable
from the predictor variables easily.
Data Types: char
Response variable name, specified as the comma-separated pair
consisting of 'ResponseVar'
and a case-sensitive
character vector. The response variable data must be binary, the
"Good"
or "Bad"
indicator.
If not specified, 'ResponseVar'
is set to the last
column of the input data
by default.
Data Types: char
Names of predictor variables, specified as the comma-separated
pair consisting of 'PredictorVars'
and a
case-sensitive cell array of character vectors or string array. By
default, when you create a creditscorecard
object, all variables are predictors except for
IDVar
and ResponseVar
.
Any name you specify using 'PredictorVars'
must
differ from the IDVar
and
ResponseVar
names.
Data Types: cell
| string
Name of weights variable, specified as the comma-separated pair
consisting of 'WeightsVar'
and a case-sensitive
character vector to indicate which column name in the
data
table contains the row weights.
If you do not specify 'WeightsVar'
when you
create a creditscorecard
object, then the
function uses the unit weights as the observation weights.
Data Types: char
Number of (equal frequency) bins for numeric predictors, specified
as the comma-separated pair consisting of
'NumBins'
and a scalar numeric.
Data Types: double
Small shift in frequency tables that contain zero entries,
specified as the comma-separated pair consisting of
'FrequencyShift'
and a scalar numeric with a
value between 0
and 1
.
If the frequency table of a predictor contains any "pure" bins
(containing all goods or all bads) after you bin the data using
autobinning
, then
the function adds the 'FrequencyShift'
value to
all bins in the table. To avoid any perturbation, set
'FrequencyShift'
to
0
.
Data Types: double
Output Arguments
Calculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:
'InfoValue'
— Information value. This metric measures the strength of a predictor in the fitting model by determining the deviation between the distributions of"Goods"
and"Bads"
.'AccuracyRatio'
— Accuracy ratio.'AUROC'
— Area under the ROC curve.'Entropy'
— Entropy. This metric measures the level of unpredictability in the bins. You can use the entropy metric to validate a risk model.'Gini'
— Gini. This metric measures the statistical dispersion or inequality within a sample of data.'Chi2PValue'
— Chi-square p-value. This metric is computed from the chi-square metric and is a measure of the statistical difference and independence between groups.'PercentMissing'
— Percentage of missing values in the predictor. This metric is expressed in decimal form.
Extended Capabilities
This function supports input data
that is specified as a
tall column vector, a tall table, or a tall timetable. Note that the output for
numeric predictors might be slightly different when using a tall array.
Categorical predictors return the same results for tables and tall arrays. For
more information, see tall
and Tall Arrays.
Version History
Introduced in R2019a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)