# lassoglm

Lasso or elastic net regularization for generalized linear model regression

## Syntax

```B = lassoglm(X,Y)[B,FitInfo] = lassoglm(X,Y)[B,FitInfo] = lassoglm(X,Y,distr)[B,FitInfo] = lassoglm(X,Y,distr,Name,Value)```

## Description

`B = lassoglm(X,Y)` returns penalized maximum-likelihood fitted coefficients for a generalized linear model of the response `Y` to the data matrix `X`. `Y` are assumed to have a Gaussian probability distribution.

```[B,FitInfo] = lassoglm(X,Y)``` returns a structure containing information about the fits.

```[B,FitInfo] = lassoglm(X,Y,distr)``` fits the model using the probability distribution type for `Y` as specified in `distr`.

```[B,FitInfo] = lassoglm(X,Y,distr,Name,Value)``` fits regularized generalized linear regressions with additional options specified by one or more `Name,Value` pair arguments.

## Input Arguments

 `X` Numeric matrix with `n` rows and `p` columns. Each row represents one observation, and each column represents one predictor (variable). `Y` When `distr` is not `'binomial'`, `Y` is a numeric vector or categorical array of length `n`, where `n` is the number of rows of `X`. `Y(i)` is the response to row `i` of `X`. When `distr` is `'binomial'`, `Y` is either a: Numeric vector of length `n`, where each entry represents success (`1`) or failure (`0`)Logical vector of length `n`, where each entry represents success or failureCategorical array of length `n`, where each entry represents success or failureTwo column numeric matrix, where the first column contains the number of successes for each observation, and the second column contains the total number of trials `distr` Distributional family for the nonsystematic variation in the responses, a string. Choices: `'normal'``'binomial'``'poisson'``'gamma'``'inverse gaussian'` By default, `lassoglm` uses the canonical link function corresponding to `distr`. Specify another link function using the `'link'` name-value pair.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside single quotes (`' '`). You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

`'Alpha'`

Scalar value from `0` to `1` (excluding `0`) representing the weight of lasso (L1) versus ridge (L2) optimization. `Alpha = 1` represents lasso regression, and other values represent elastic net optimization. `Alpha` close to `0` approaches ridge regression. See Definitions.

Default: `1`

`'CV'`

Method `lassoglm` uses to estimate deviance:

• `K`, a positive integer — `lassoglm` uses `K`-fold cross validation.

• `cvp`, a `cvpartition` object — `lassoglm` uses the cross-validation method expressed in `cvp`. You cannot use a `'leaveout'` partition with `lassoglm`.

• `'resubstitution'``lassoglm` uses `X` and `Y` to fit the model and to estimate the deviance, without cross validation.

Default: `'resubstitution'`

`'DFmax'`

Maximum number of nonzero coefficients in the model. `lassoglm` returns results for `Lambda` values that satisfy this criterion.

Default: `Inf`

`'Lambda'`

Vector of nonnegative `Lambda` values. See Lasso.

• If you do not supply `Lambda`, `lassoglm` estimates the largest value of `Lambda` that gives a nonnull model. In this case, `LambdaRatio` gives the ratio of the smallest to the largest value of the sequence, and `NumLambda` gives the length of the vector.

• If you supply `Lambda`, `lassoglm` ignores `LambdaRatio` and `NumLambda`.

Default: Geometric sequence of `NumLambda` values, the largest just sufficient to produce `B` = `0`

`'LambdaRatio'`

Positive scalar, the ratio of the smallest to the largest `Lambda` value when you do not explicitly set `Lambda`.

If you set `LambdaRatio = 0`, `lassoglm` generates a default sequence of `Lambda` values, and replaces the smallest one with `0`.

Default: `1e-4`

`'Link'`

Specify the mapping between the mean µ of the response and the linear predictor Xb.

ValueDescription
`'comploglog'`

log( –log((1–µ))) = Xb

`'identity'`, default for the distribution `'normal'`

µ = Xb

`'log'`, default for the distribution `'poisson'`

log(µ) = Xb

`'logit'`, default for the distribution `'binomial'`

log(µ/(1 – µ)) = Xb

`'loglog'`

log( –log(µ)) = Xb

`'probit'`

Φ–1(µ) = Xb, where Φ is the normal (Gaussian) CDF function

`'reciprocal'`, default for the distribution `'gamma'`

µ–1 = Xb

`p` (a number), default for the distribution ```'inverse gaussian'``` (with p = –2)

µp = Xb

Cell array of the form `{FL FD FI}`, containing three function handles, created using `@`, that define the link (`FL`), the derivative of the link (`FD`), and the inverse link (`FI`). Equivalently, can be a structure of function handles with field `Link` containing `FL`, field `Derivative` containing `FD`, and field `Inverse` containing `FI`.

`'MCReps'`

Positive integer, the number of Monte Carlo repetitions for cross validation.

• If `CV` is `'resubstitution'` or a `cvpartition` of type `'resubstitution'`, `MCReps` must be `1`.

• If `CV` is a `cvpartition` of type `'holdout'`, `MCReps` must be greater than `1`.

Default: `1`

`'NumLambda'`

Positive integer, the number of `Lambda` values `lassoglm` uses when you do not set `Lambda`. `lassoglm` can return fewer than `NumLambda` fits if the deviance of the fits drops below a threshold fraction of the null deviance (deviance of the fit without any predictors `X`).

Default: `100`

`'Offset'`

Numeric vector with the same number of rows as `X`. `lassoglm` uses `Offset` as an additional predictor variable, but keeps its coefficient value fixed at `1.0`.

`'Options'`

Structure that specifies whether to cross validate in parallel, and specifies the random stream or streams. Create the `Options` structure with `statset`. Option fields:

• `UseParallel` — Set to `true` to compute in parallel. Default is `false`.

• `UseSubstreams` — Set to `true` to compute in parallel in a reproducible fashion. To compute reproducibly, set `Streams` to a type allowing substreams: `'mlfg6331_64'` or `'mrg32k3a'`. Default is `false`.

• `Streams``RandStream` object or cell array consisting of one such object. If you do not specify `Streams`, `lassoglm` uses the default stream.

`'PredictorNames'`

Cell array of strings representing names of the predictor variables, in the order in which they appear in `X`.

Default: `{}`

`'RelTol'`

Convergence threshold for the coordinate descent algorithm (see Friedman, Tibshirani, and Hastie [3]). The algorithm terminates when successive estimates of the coefficient vector differ in the L2 norm by a relative amount less than `RelTol`.

Default: `1e-4`

`'Standardize'`

Boolean value specifying whether `lassoglm` scales `X` before fitting the models. This affects whether the regularization is applied to the coefficients on the standardized scale or original scale. The results are always presented on the original scale.

Default: `true`

`'Weights'`

Observation weights, a nonnegative vector of length `n`, where `n` is the number of rows of `X`. At least two values must be positive.

Default: `1/n * ones(n,1)`

## Output Arguments

`B`

Fitted coefficients, a `p`-by-`L` matrix, where `p` is the number of predictors (columns) in `X`, and `L` is the number of `Lambda` values.

`FitInfo`

Structure containing information about the model fits.

Field in FitInfoDescription
`Alpha`Value of `Alpha` parameter, a scalar.
`Deviance`Deviance of the fitted model for each value of `Lambda`, a `1`-by-`L` vector.
If cross validation was performed, the values for `Deviance` represent the estimated expected deviance of the model applied to new data, as calculated by cross validation. Otherwise, `Deviance` is the deviance of the fitted model applied to the data used to perform the fit.
`DF`Number of nonzero coefficients in `B` for each `Lambda` value, a `1`-by-`L` vector.
`Intercept`Intercept term β0 for each linear model, a `1`-by-`L` vector.
`Lambda``Lambda` parameters in ascending order, a `1`-by-`L` vector.

If you set the `CV` name-value pair to cross validate, the `FitInfo` structure contains additional fields.

Field in FitInfoDescription
`IndexMinDeviance`Index of `Lambda` with value `LambdaMinDeviance`, a scalar.
`Index1SE`Index of `Lambda` with value `Lambda1SE`, a scalar.
`LambdaMinDeviance``Lambda` value with minimum expected deviance, as calculated by cross validation, a scalar.
`Lambda1SE`Largest `Lambda` such that `Deviance` is within one standard error of the minimum, a scalar.
`SE`Standard error of `Deviance` for each `Lambda`, as calculated during cross validation, a `1`-by-`L` vector.

## Examples

collapse all

### Lasso Regularization of a Generalized Linear Model

Construct data from a Poisson model, and identify the important predictors using `lassoglm`.

Create data with 20 predictors, and Poisson responses using just three of the predictors, plus a constant.

```rng('default') % for reproducibility X = randn(100,20); mu = exp(X(:,[5 10 15])*[.4;.2;.3] + 1); y = poissrnd(mu);```

Construct a cross-validated lasso regularization of a Poisson regression model of the data.

`[B FitInfo] = lassoglm(X,y,'poisson','CV',10);`

Examine the cross-validation plot to see the effect of the `Lambda` regularization parameter.

`lassoPlot(B,FitInfo,'plottype','CV');`

The green circle and dashed line locate the `Lambda` with minimal cross-validation error. The blue circle and dashed line locate the point with minimal cross-validation error plus one standard deviation.

Find the nonzero model coefficients corresponding to the two identified points.

`minpts = find(B(:,FitInfo.IndexMinDeviance))`
```minpts = 3 5 6 10 11 15 16```
`min1pts = find(B(:,FitInfo.Index1SE))`
```min1pts = 5 10 15```

The coefficients from the minimal plus one standard error point are exactly those coefficients used to create the data.

### Related Examples

collapse all

A link function f(μ) maps a distribution with mean μ to a linear model with data X and coefficient vector b using the formula

f(μ) = Xb.

Find the formulas for the link functions in the `Link` name-value pair description. Here, "typical" means a link function that is typically used for the listed distribution.

`'normal'``{'identity'}`
`'binomial'``'comploglog'`, `'loglog'`, `'probit'`, `{'logit'}`
`'poisson'``{'log'}`
`'gamma'``{'reciprocal'}`
`'inverse gaussian'``{-2}`

### Lasso

For a nonnegative value of λ, `lasso` solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta }_{0},\beta \right)+\lambda \sum _{j=1}^{p}|{\beta }_{j}|\right),$

where

• Deviance is the deviance of the model fit to the responses using intercept β0 and predictor coefficients β. The formula for Deviance depends on the `distr` parameter you supply to `lassoglm`. Minimizing the λ-penalized deviance is equivalent to maximizing the λ-penalized log likelihood.

• N is the number of observations.

• λ is a nonnegative regularization parameter corresponding to one value of `Lambda`.

• Parameters β0 and β are scalar and p-vector respectively.

As λ increases, the number of nonzero components of β decreases.

The lasso problem involves the L1 norm of β, as contrasted with the elastic net algorithm.

### Elastic Net

For an α strictly between 0 and 1, and a nonnegative λ, elastic net solves the problem

$\underset{{\beta }_{0},\beta }{\mathrm{min}}\left(\frac{1}{N}\text{Deviance}\left({\beta }_{0},\beta \right)+\lambda {P}_{\alpha }\left(\beta \right)\right),$

where

${P}_{\alpha }\left(\beta \right)=\frac{\left(1-\alpha \right)}{2}{‖\beta ‖}_{2}^{2}+\alpha {‖\beta ‖}_{1}=\sum _{j=1}^{p}\left(\frac{\left(1-\alpha \right)}{2}{\beta }_{j}^{2}+\alpha |{\beta }_{j}|\right).$

Elastic net is the same as lasso when α = 1. For other values of α, the penalty term Pα(β) interpolates between the L1 norm of β and the squared L2 norm of β. As α shrinks toward 0, elastic net approaches `ridge` regression.

## References

[1] Tibshirani, R. Regression Shrinkage and Selection via the Lasso. Journal of the Royal Statistical Society, Series B, Vol. 58, No. 1, pp. 267–288, 1996.

[2] Zou, H. and T. Hastie. Regularization and Variable Selection via the Elastic Net. Journal of the Royal Statistical Society, Series B, Vol. 67, No. 2, pp. 301–320, 2005.

[3] Friedman, J., R. Tibshirani, and T. Hastie. Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, Vol. 33, No. 1, 2010. `http://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. The Elements of Statistical Learning, 2nd edition. Springer, New York, 2008.

[5] Dobson, A. J. An Introduction to Generalized Linear Models, 2nd edition. Chapman & Hall/CRC Press, New York, 2002.

[6] McCullagh, P., and J. A. Nelder. Generalized Linear Models, 2nd edition. Chapman & Hall/CRC Press, New York, 1989.

[7] Collett, D. Modelling Binary Data, 2nd edition. Chapman & Hall/CRC Press, New York, 2003.