# lasso

Lasso or elastic net regularization for linear models

## Description

returns fitted least-squares regression coefficients for linear models of the
predictor data `B`

= lasso(`X`

,`y`

)`X`

and the response `y`

. Each
column of `B`

corresponds to a particular regularization
coefficient in `Lambda`

. By default, `lasso`

performs lasso regularization using a geometric sequence of
`Lambda`

values.

fits regularized regressions with additional options specified by one or more
name-value pair arguments. For example, `B`

= lasso(`X`

,`y`

,`Name,Value`

)`'Alpha',0.5`

sets elastic
net as the regularization method, with the parameter `Alpha`

equal
to 0.5.

## Examples

### Remove Redundant Predictors Using Lasso Regularization

Construct a data set with redundant predictors and identify those predictors by using `lasso`

.

Create a matrix `X`

of 100 five-dimensional normal variables. Create a response vector `y`

from just two components of `X`

, and add a small amount of noise.

rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the default lasso fit.

B = lasso(X,y);

Find the coefficient vector for the 25th `Lambda`

value in `B`

.

B(:,25)

`ans = `*5×1*
0
1.6093
0
-2.5865
0

`lasso`

identifies and removes the redundant predictors.

### Create Linear Model Without Intercept Term Using Lasso Regularization

Create sample data with predictor variable `X`

and response variable $$y=0+2X+\epsilon $$.

rng('default') % For reproducibility X = rand(100,1); y = 2*X + randn(100,1)/10;

Specify a regularization value, and find the coefficient of the regression model without an intercept term.

lambda = 1e-03; B = lasso(X,y,'Lambda',lambda,'Intercept',false)

Warning: When the 'Intercept' value is false, the 'Standardize' value is set to false.

B = 1.9825

Plot the real values (points) against the predicted values (line).

scatter(X,y) hold on x = 0:0.1:1; plot(x,x*B) hold off

### Remove Redundant Predictors by Using Cross-Validated Fits

Construct a data set with redundant predictors and identify those predictors by using cross-validated `lasso`

.

Create a matrix `X`

of 100 five-dimensional normal variables. Create a response vector `y`

from two components of `X`

, and add a small amount of noise.

rng default % For reproducibility X = randn(100,5); weights = [0;2;0;-3;0]; % Only two nonzero coefficients y = X*weights + randn(100,1)*0.1; % Small added noise

Construct the lasso fit by using 10-fold cross-validation with labeled predictor variables.

[B,FitInfo] = lasso(X,y,'CV',10,'PredictorNames',{'x1','x2','x3','x4','x5'});

Display the variables in the model that corresponds to the minimum cross-validated mean squared error (MSE).

idxLambdaMinMSE = FitInfo.IndexMinMSE; minMSEModelPredictors = FitInfo.PredictorNames(B(:,idxLambdaMinMSE)~=0)

`minMSEModelPredictors = `*1x2 cell*
{'x2'} {'x4'}

Display the variables in the sparsest model within one standard error of the minimum MSE.

idxLambda1SE = FitInfo.Index1SE; sparseModelPredictors = FitInfo.PredictorNames(B(:,idxLambda1SE)~=0)

`sparseModelPredictors = `*1x2 cell*
{'x2'} {'x4'}

In this example, `lasso`

identifies the same predictors for the two models and removes the redundant predictors.

### Lasso Plot with Cross-Validated Fits

Visually examine the cross-validated error of various levels of regularization.

Load the sample data.

`load acetylene`

Create a design matrix with interactions and no constant term.

X = [x1 x2 x3]; D = x2fx(X,'interaction'); D(:,1) = []; % No constant term

Construct the lasso fit using 10-fold cross-validation. Include the `FitInfo`

output so you can plot the result.

rng default % For reproducibility [B,FitInfo] = lasso(D,y,'CV',10);

Plot the cross-validated fits.

lassoPlot(B,FitInfo,'PlotType','CV'); legend('show') % Show legend

The green circle and dotted line locate the `Lambda`

with minimum cross-validation error. The blue circle and dotted line locate the point with minimum cross-validation error plus one standard error.

### Predict Values Using Elastic Net Regularization

Predict students' exam scores using `lasso`

and the elastic net method.

Load the `examgrades`

data set.

```
load examgrades
X = grades(:,1:4);
y = grades(:,5);
```

Split the data into training and test sets.

```
n = length(y);
c = cvpartition(n,'HoldOut',0.3);
idxTrain = training(c,1);
idxTest = ~idxTrain;
XTrain = X(idxTrain,:);
yTrain = y(idxTrain);
XTest = X(idxTest,:);
yTest = y(idxTest);
```

Find the coefficients of a regularized linear regression model using 10-fold cross-validation and the elastic net method with `Alpha`

= 0.75. Use the largest `Lambda`

value such that the mean squared error (MSE) is within one standard error of the minimum MSE.

[B,FitInfo] = lasso(XTrain,yTrain,'Alpha',0.75,'CV',10); idxLambda1SE = FitInfo.Index1SE; coef = B(:,idxLambda1SE); coef0 = FitInfo.Intercept(idxLambda1SE);

Predict exam scores for the test data. Compare the predicted values to the actual exam grades using a reference line.

yhat = XTest*coef + coef0; hold on scatter(yTest,yhat) plot(yTest,yTest) xlabel('Actual Exam Grades') ylabel('Predicted Exam Grades') hold off

### Use Correlation Matrix for Fitting Lasso

Create a matrix `X`

of `N`

`p`

-dimensional normal variables, where `N`

is large and `p`

= 1000. Create a response vector `y`

from the model `y = beta0 + X*p`

, where `beta0`

is a constant, along with additive noise.

rng default % For reproducibility N = 1e4; % Number of samples p = 1e3; % Number of features X = randn(N,p); beta = randn(p,1); % Multiplicative coefficients beta0 = randn; % Additive term y = beta0 + X*beta + randn(N,1); % Last term is noise

Construct the default lasso fit. Time the creation.

B = lasso(X,y,"UseCovariance",false); % Warm up lasso for reliable timing data tic B = lasso(X,y,"UseCovariance",false); timefalse = toc

timefalse = 4.8730

Construct the lasso fit using the covariance matrix. Time the creation.

B2 = lasso(X,y,"UseCovariance",true); % Warm up lasso for reliable timing data tic B2 = lasso(X,y,"UseCovariance",true); timetrue = toc

timetrue = 0.4347

The fitting time with the covariance matrix is much less than the time without it. View the speedup factor that results from using the covariance matrix.

speedup = timefalse/timetrue

speedup = 11.2105

Check that the returned coefficients `B`

and `B2`

are similar.

norm(B-B2)/norm(B)

ans = 5.2508e-15

The results are virtually identical.

## Input Arguments

`X`

— Predictor data

numeric matrix

Predictor data, specified as a numeric matrix. Each row represents one observation, and each column represents one predictor variable.

**Data Types: **`single`

| `double`

`y`

— Response data

numeric vector

Response data, specified as a numeric vector. `y`

has
length *n*, where *n* is the number of
rows of `X`

. The response `y(i)`

corresponds to the *i*th row of
`X`

.

**Data Types: **`single`

| `double`

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **`lasso(X,y,'Alpha',0.75,'CV',10)`

performs elastic net
regularization with 10-fold cross-validation. The `'Alpha',0.75`

name-value pair argument sets the parameter used in the elastic net
optimization.

`AbsTol`

— Absolute error tolerance

`1e–4`

(default) | positive scalar

Absolute error tolerance used to determine the convergence of the
ADMM Algorithm, specified as the
comma-separated pair consisting of `'AbsTol'`

and a
positive scalar. The algorithm converges when successive estimates of
the coefficient vector differ by an amount less than
`AbsTol`

.

**Note**

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Example: **`'AbsTol',1e–3`

**Data Types: **`single`

| `double`

`Alpha`

— Weight of lasso versus ridge optimization

`1`

(default) | positive scalar

Weight of lasso (*L*^{1})
versus ridge (*L*^{2})
optimization, specified as the comma-separated pair consisting of
`'Alpha'`

and a positive scalar value in the
interval `(0,1]`

. The value
`Alpha = 1`

represents lasso regression,
`Alpha`

close to `0`

approaches
ridge regression, and other
values represent elastic net optimization. See Elastic Net.

**Example: **`'Alpha',0.5`

**Data Types: **`single`

| `double`

`B0`

— Initial values for *x*-coefficients in ADMM Algorithm

vector of zeros (default) | numeric vector

Initial values for *x*-coefficients in ADMM Algorithm, specified as the
comma-separated pair consisting of `'B0'`

and a numeric
vector.

**Note**

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Data Types: **`single`

| `double`

`CacheSize`

— Size of covariance matrix in megabytes

`1000`

(default) | positive scalar | `'maximal'`

Size of the covariance matrix in megabytes, specified as a positive scalar or `'maximal'`

. The `lasso`

function can use a covariance matrix for fitting when the `UseCovariance`

argument is `true`

or `'auto'`

.

If `UseCovariance`

is `true`

or `'auto'`

and `CacheSize`

is `'maximal'`

, `lasso`

can attempt to allocate a covariance matrix that exceeds the available memory. In this case, MATLAB^{®} issues an error.

**Example: **`'CacheSize','maximal'`

**Data Types: **`double`

| `char`

| `string`

`CV`

— Cross-validation specification for estimating mean squared
error

`'resubstitution'`

(default) | positive integer scalar | `cvpartition`

object

Cross-validation specification for estimating the mean squared error
(MSE), specified as the comma-separated pair consisting of
`'CV'`

and one of the following:

`'resubstitution'`

—`lasso`

uses`X`

and`y`

to fit the model and to estimate the MSE without cross-validation.Positive scalar integer

`K`

—`lasso`

uses`K`

-fold cross-validation.`cvpartition`

object`cvp`

—`lasso`

uses the cross-validation method expressed in`cvp`

. You cannot use a`'leaveout'`

partition with`lasso`

.

**Example: **`'CV',3`

`DFmax`

— Maximum number of nonzero coefficients

`Inf`

(default) | positive integer scalar

Maximum number of nonzero coefficients in the model, specified as the
comma-separated pair consisting of `'DFmax'`

and a
positive integer scalar. `lasso`

returns results only
for `Lambda`

values that satisfy this
criterion.

**Example: **`'DFmax',5`

**Data Types: **`single`

| `double`

`Intercept`

— Flag for fitting the model with intercept term

`true`

(default) | `false`

Flag for fitting the model with the intercept term, specified as the
comma-separated pair consisting of `'Intercept'`

and
either `true`

or `false`

. The default
value is `true`

, which indicates to include the
intercept term in the model. If `Intercept`

is
`false`

, then the returned intercept value is
0.

**Example: **`'Intercept',false`

**Data Types: **`logical`

`Lambda`

— Regularization coefficients

nonnegative vector

Regularization coefficients, specified as the comma-separated pair
consisting of `'Lambda'`

and a vector of nonnegative
values. See Lasso.

If you do not supply

`Lambda`

, then`lasso`

calculates the largest value of`Lambda`

that gives a nonnull model. In this case,`LambdaRatio`

gives the ratio of the smallest to the largest value of the sequence, and`NumLambda`

gives the length of the vector.If you supply

`Lambda`

, then`lasso`

ignores`LambdaRatio`

and`NumLambda`

.If

`Standardize`

is`true`

, then`Lambda`

is the set of values used to fit the models with the`X`

data standardized to have zero mean and a variance of one.

The default is a geometric sequence of `NumLambda`

values, with only the largest value able to produce
`B`

= `0`

.

**Example: **`'Lambda',linspace(0,1)`

**Data Types: **`single`

| `double`

`LambdaRatio`

— Ratio of smallest to largest `Lambda`

values

`1e–4`

(default) | positive scalar

Ratio of the smallest to the largest `Lambda`

values when you do not supply `Lambda`

, specified as
the comma-separated pair consisting of `'LambdaRatio'`

and a positive scalar.

If you set `LambdaRatio`

= 0, then
`lasso`

generates a default sequence of
`Lambda`

values and replaces the smallest one
with `0`

.

**Example: **`'LambdaRatio',1e–2`

**Data Types: **`single`

| `double`

`MaxIter`

— Maximum number of iterations allowed

positive integer scalar

Maximum number of iterations allowed, specified as the comma-separated
pair consisting of `'MaxIter'`

and a positive integer
scalar.

If the algorithm executes `MaxIter`

iterations
before reaching the convergence tolerance `RelTol`

,
then the function stops iterating and returns a warning message.

The function can return more than one warning when
`NumLambda`

is greater than
`1`

.

Default values are `1e5`

for standard data and
`1e4`

for tall arrays.

**Example: **`'MaxIter',1e3`

**Data Types: **`single`

| `double`

`MCReps`

— Number of Monte Carlo repetitions for cross-validation

`1`

(default) | positive integer scalar

Number of Monte Carlo repetitions for cross-validation, specified as
the comma-separated pair consisting of `'MCReps'`

and a
positive integer scalar.

If

`CV`

is`'resubstitution'`

or a`cvpartition`

of type`'resubstitution'`

, then`MCReps`

must be`1`

.If

`CV`

is a`cvpartition`

of type`'holdout'`

, then`MCReps`

must be greater than`1`

.

**Example: **`'MCReps',5`

**Data Types: **`single`

| `double`

`NumLambda`

— Number of `Lambda`

values

`100`

(default) | positive integer scalar

Number of `Lambda`

values
`lasso`

uses when you do not supply
`Lambda`

, specified as the comma-separated pair
consisting of `'NumLambda'`

and a positive integer
scalar. `lasso`

can return fewer than
`NumLambda`

fits if the residual error of the
fits drops below a threshold fraction of the variance of
`y`

.

**Example: **`'NumLambda',50`

**Data Types: **`single`

| `double`

`Options`

— Option to cross-validate in parallel and specify random streams

structure

Option to cross-validate in parallel and specify the random streams,
specified as the comma-separated pair consisting of
`'Options'`

and a structure. This option requires
Parallel Computing Toolbox™.

Create the `Options`

structure with `statset`

. The option
fields are:

`UseParallel`

— Set to`true`

to compute in parallel. The default is`false`

.`UseSubstreams`

— Set to`true`

to compute in parallel in a reproducible fashion. For reproducibility, set`Streams`

to a type allowing substreams:`'mlfg6331_64'`

or`'mrg32k3a'`

. The default is`false`

.`Streams`

— A`RandStream`

object or cell array consisting of one such object. If you do not specify`Streams`

, then`lasso`

uses the default stream.

**Example: **`'Options',statset('UseParallel',true)`

**Data Types: **`struct`

`PredictorNames`

— Names of predictor variables

`{}`

(default) | string array | cell array of character vectors

Names of the predictor variables, in the order in which they appear in
`X`

, specified as the comma-separated pair
consisting of `'PredictorNames'`

and a string array or
cell array of character vectors.

**Example: **`'PredictorNames',{'x1','x2','x3','x4'}`

**Data Types: **`string`

| `cell`

`RelTol`

— Convergence threshold for coordinate descent algorithm

`1e–4`

(default) | positive scalar

Convergence threshold for the coordinate descent algorithm [3], specified as the comma-separated pair
consisting of `'RelTol'`

and a positive scalar. The
algorithm terminates when successive estimates of the coefficient vector
differ in the *L*^{2} norm by a
relative amount less than `RelTol`

.

**Example: **`'RelTol',5e–3`

**Data Types: **`single`

| `double`

`Rho`

— Augmented Lagrangian parameter

positive scalar

Augmented Lagrangian parameter *ρ* for the ADMM Algorithm, specified as the
comma-separated pair consisting of `'Rho'`

and a
positive scalar. The default is automatic selection.

**Note**

This option applies only when you use `lasso`

on tall arrays. See Extended Capabilities for more
information.

**Example: **`'Rho',2`

**Data Types: **`single`

| `double`

`Standardize`

— Flag for standardizing predictor data before fitting models

`true`

(default) | `false`

Flag for standardizing the predictor data `X`

before fitting the models, specified as the comma-separated pair
consisting of `'Standardize'`

and either
`true`

or `false`

. If
`Standardize`

is `true`

, then
the `X`

data is scaled to have zero mean and a
variance of one. `Standardize`

affects whether the
regularization is applied to the coefficients on the standardized scale
or the original scale. The results are always presented on the original
data scale.

If `Intercept`

is `false`

, then
the software sets `Standardize`

to
`false`

, regardless of the
`Standardize`

value you specify.

`X`

and `y`

are always centered
when `Intercept`

is `true`

.

**Example: **`'Standardize',false`

**Data Types: **`logical`

`UseCovariance`

— Indication to use covariance matrix for fitting

`'auto'`

(default) | logical scalar

Indication to use a covariance matrix for fitting, specified as
`'auto'`

or a logical scalar.

`'auto'`

causes`lasso`

to attempt to use a covariance matrix for fitting when the number of observations is greater than the number of problem variables. This attempt can fail when memory is insufficient. To find out whether`lasso`

used a covariance matrix for fitting, examine the`UseCovariance`

field of the`FitInfo`

output.`true`

causes`lasso`

to use a covariance matrix for fitting as long as the required size does not exceed`CacheSize`

. If the required covariance matrix size exceeds`CacheSize`

,`lasso`

issues a warning and does not use a covariance matrix for fitting.`false`

causes`lasso`

not to use a covariance matrix for fitting.

Using a covariance matrix for fitting can be faster than not using one, but can require more memory. See Use Correlation Matrix for Fitting Lasso. The speed increase can negatively affect numerical stability. For details, see Coordinate Descent Algorithm.

**Example: **`'UseCovariance',true`

**Data Types: **`logical`

| `char`

| `string`

`U0`

— Initial value of scaled dual variable

vector of zeros (default) | numeric vector

Initial value of the scaled dual variable *u* in the
ADMM Algorithm, specified as the
comma-separated pair consisting of `'U0'`

and a numeric
vector.

**Note**

`lasso`

on tall arrays. See Extended Capabilities for more
information.

**Data Types: **`single`

| `double`

`Weights`

— Observation weights

`1/n*ones(n,1)`

(default) | nonnegative vector

Observation weights, specified as the comma-separated pair consisting
of `'Weights'`

and a nonnegative vector.
`Weights`

has length *n*, where
*n* is the number of rows of
`X`

. The `lasso`

function scales
`Weights`

to sum to `1`

.

**Data Types: **`single`

| `double`

## Output Arguments

`B`

— Fitted coefficients

numeric matrix

Fitted coefficients, returned as a numeric matrix. `B`

is a *p*-by-*L* matrix, where
*p* is the number of predictors (columns) in
`X`

, and *L* is the number of
`Lambda`

values. You can specify the number of
`Lambda`

values using the
`NumLambda`

name-value pair argument.

The coefficient corresponding to the intercept term is a field in
`FitInfo`

.

**Data Types: **`single`

| `double`

`FitInfo`

— Fit information of models

structure

Fit information of the linear models, returned as a structure with the fields described in this table.

Field in
`FitInfo` | Description |
---|---|

`Intercept` | Intercept term
β_{0} for each
linear model, a `1` -by-L
vector |

`Lambda` | Lambda parameters in ascending order, a
`1` -by-L
vector |

`Alpha` | Value of the `Alpha` parameter, a
scalar |

`DF` | Number of nonzero coefficients in `B`
for each value of `Lambda` , a
`1` -by-L
vector |

`MSE` | Mean squared error (MSE), a
`1` -by-L
vector |

`PredictorNames` | Value of the `PredictorNames` parameter,
stored as a cell array of character vectors |

`UseCovariance` | Logical value indicating whether the covariance matrix
was used in fitting. If the covariance was computed and
used, this field is `true` . Otherwise, this
field is `false` . |

If you set the `CV`

name-value pair argument to
cross-validate, the `FitInfo`

structure contains these
additional fields.

Field in
`FitInfo` | Description |
---|---|

`SE` | Standard error of MSE for each `Lambda` ,
as calculated during cross-validation, a
`1` -by-L
vector |

`LambdaMinMSE` | `Lambda` value with the minimum MSE, a
scalar |

`Lambda1SE` | Largest `Lambda` value such that MSE is
within one standard error of the minimum MSE, a
scalar |

`IndexMinMSE` | Index of `Lambda` with the value
`LambdaMinMSE` , a scalar |

`Index1SE` | Index of `Lambda` with the value
`Lambda1SE` , a scalar |

## More About

### Lasso

For a given value of *λ*, a nonnegative parameter,
`lasso`

solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {\displaystyle \sum _{j=1}^{p}\left|{\beta}_{j}\right|}\right).$$

*N*is the number of observations.*y*is the response at observation_{i}*i*.*x*is data, a vector of length_{i}*p*at observation*i*.*λ*is a nonnegative regularization parameter corresponding to one value of`Lambda`

.The parameters

*β*_{0}and*β*are a scalar and a vector of length*p*, respectively.

As *λ* increases, the number of nonzero components of
*β* decreases.

The lasso problem involves the *L*^{1}
norm of *β*, as contrasted with the elastic net algorithm.

### Elastic Net

For *α* strictly between 0 and 1, and nonnegative
*λ*, elastic net solves the problem

$$\underset{{\beta}_{0},\beta}{\mathrm{min}}\left(\frac{1}{2N}{\displaystyle \sum _{i=1}^{N}{\left({y}_{i}-{\beta}_{0}-{x}_{i}^{T}\beta \right)}^{2}}+\lambda {P}_{\alpha}\left(\beta \right)\right),$$

where

$${P}_{\alpha}\left(\beta \right)=\frac{(1-\alpha )}{2}{\Vert \beta \Vert}_{2}^{2}+\alpha {\Vert \beta \Vert}_{1}={\displaystyle \sum _{j=1}^{p}\left(\frac{(1-\alpha )}{2}{\beta}_{j}^{2}+\alpha \left|{\beta}_{j}\right|\right)}.$$

Elastic net is the same as lasso when *α* = 1. For
other values of *α*, the penalty term
*P _{α}*(

*β*) interpolates between the

*L*

^{1}norm of

*β*and the squared

*L*

^{2}norm of

*β*. As

*α*shrinks toward 0, elastic net approaches

`ridge`

regression.## Algorithms

### Coordinate Descent Algorithm

`lasso`

fits many values of *λ*
simultaneously by an efficient procedure named *coordinate
descent*, based on Friedman, Tibshirani, and Hastie [3]. The procedure has two main code paths depending on whether the fitting uses a
covariance matrix. You can affect this choice with the
`UseCovariance`

name-value argument.

When `lasso`

uses a covariance matrix to fit
`N`

data points and `D`

predictors, the
fitting has a rough computational complexity of `D*D`

. Without a
covariance matrix, the computational complexity is roughly `N*D`

.
So, typically, using a covariance matrix can be faster when ```
N >
D
```

, and the default `'auto'`

setting of the
`UseCovariance`

argument makes this choice. Using a covariance
matrix causes `lasso`

to subtract larger numbers than
otherwise, which can be less numerically stable. For details of the algorithmic
differences, see [3]. For one comparison of timing and accuracy
differences, see Use Correlation Matrix for Fitting Lasso.

### ADMM Algorithm

When operating on tall arrays, `lasso`

uses an algorithm based
on the Alternating Direction Method of Multipliers (ADMM) [5]. The notation used here is the same as in the reference paper. This method solves
problems of the form

*Minimize*
$$l\left(x\right)+g\left(z\right)$$

*Subject to*
$$Ax+Bz=c$$

Using this notation, the lasso regression problem is

*Minimize*
$$l\left(x\right)+g\left(z\right)=\frac{1}{2}{\Vert Ax-b\Vert}_{2}^{2}+\lambda {\Vert z\Vert}_{1}$$

*Subject to*
$$x-z=0$$

Because the loss function $$l\left(x\right)=\frac{1}{2}{\Vert Ax-b\Vert}_{2}^{2}$$ is quadratic, the iterative updates performed by the algorithm amount to solving a linear system of equations with a single coefficient matrix but several right-hand sides. The updates performed by the algorithm during each iteration are

$$\begin{array}{l}{x}^{k+1}={\left({A}^{T}A+\rho I\right)}^{-1}\left({A}^{T}b+\rho \left({z}^{k}-{u}^{k}\right)\right)\\ {z}^{k+1}={S}_{\lambda /\rho}\left({x}^{k+1}+{u}^{k}\right)\\ {u}^{k+1}={u}^{k}+{x}^{k+1}-{z}^{k+1}\end{array}$$

*A* is the dataset (a tall array), *x* contains
the coefficients, *ρ* is the penalty parameter (augmented
Lagrangian parameter), *b* is the response (a tall array), and
*S* is the soft thresholding operator.

$${S}_{\kappa}\left(a\right)=\{\begin{array}{c}\begin{array}{cc}a-\kappa ,\text{\hspace{0.17em}}& a>\kappa \end{array}\\ \begin{array}{cc}0,\text{\hspace{0.17em}}& \left|a\right|\text{\hspace{0.17em}}\le \kappa \text{\hspace{0.17em}}\end{array}\\ \begin{array}{cc}a+\kappa ,\text{\hspace{0.17em}}& a<\kappa \text{\hspace{0.17em}}\end{array}\end{array}.$$

`lasso`

solves the linear system using Cholesky factorization
because the coefficient matrix $${A}^{T}A+\rho I$$ is symmetric and positive definite. Because $$\rho $$ does not change between iterations, the Cholesky factorization is
cached between iterations.

Even though *A* and *b* are tall arrays, they
appear only in the terms $${A}^{T}A$$ and $${A}^{T}b$$. The results of these two matrix multiplications are small enough
to fit in memory, so they are precomputed and the iterative updates between
iterations are performed entirely within memory.

## References

[1] Tibshirani, R. “Regression Shrinkage and Selection via
the Lasso.” *Journal of the Royal Statistical Society.*
Series B, Vol. 58, No. 1, 1996, pp. 267–288.

[2] Zou, H., and T. Hastie. “Regularization and Variable
Selection via the Elastic Net.” *Journal of the Royal Statistical
Society.* Series B, Vol. 67, No. 2, 2005, pp. 301–320.

[3] Friedman, J., R. Tibshirani, and T. Hastie.
“Regularization Paths for Generalized Linear Models via Coordinate
Descent.” *Journal of Statistical Software.* Vol. 33, No. 1,
2010. `https://www.jstatsoft.org/v33/i01`

[4] Hastie, T., R. Tibshirani, and J. Friedman. *The
Elements of Statistical Learning.* 2nd edition. New York: Springer,
2008.

[5] Boyd, S. “Distributed Optimization and Statistical
Learning via the Alternating Direction Method of Multipliers.”
*Foundations and Trends in Machine Learning.* Vol. 3, No. 1,
2010, pp. 1–122.

## Extended Capabilities

### Tall Arrays

Calculate with arrays that have more rows than fit in memory.

This function supports tall arrays for out-of-memory data with some limitations.

With tall arrays,

`lasso`

uses an algorithm based on ADMM (Alternating Direction Method of Multipliers).No elastic net support. The

`'Alpha'`

parameter is always 1.No cross-validation (

`'CV'`

parameter) support, which includes the related parameter`'MCReps'`

.The output

`FitInfo`

does not contain the additional fields`'SE'`

,`'LambdaMinMSE'`

,`'Lambda1SE'`

,`'IndexMinMSE'`

, and`'Index1SE'`

.The

`'Options'`

parameter is not supported because it does not contain options that apply to the ADMM algorithm. You can tune the ADMM algorithm using name-value pair arguments.Supported name-value pair arguments are:

`'Lambda'`

`'LambdaRatio'`

`'NumLambda'`

`'Standardize'`

`'PredictorNames'`

`'RelTol'`

`'Weights'`

Additional name-value pair arguments to control the ADMM algorithm are:

`'Rho'`

— Augmented Lagrangian parameter,*ρ*. The default value is automatic selection.`'AbsTol'`

— Absolute tolerance used to determine convergence. The default value is`1e–4`

.`'MaxIter'`

— Maximum number of iterations. The default value is`1e4`

.`'B0'`

— Initial values for the coefficients*x*. The default value is a vector of zeros.`'U0'`

— Initial values of the scaled dual variable*u*. The default value is a vector of zeros.

For more information, see Tall Arrays.

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, specify the `'Options'`

name-value argument in the call
to this function and set the `'UseParallel'`

field of the options
structure to `true`

using `statset`

.

For example: `'Options',statset('UseParallel',true)`

For more information about parallel computing, see Run MATLAB Functions with Automatic Parallel Support (Parallel Computing Toolbox).

## Version History

**Introduced in R2011b**

## Open Example

You have a modified version of this example. Do you want to open this example with your edits?

## MATLAB Command

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

# Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)