# geluLayer

Gaussian error linear unit (GELU) layer

## Description

A Gaussian error linear unit (GELU) layer weights the input by its probability under a Gaussian distribution.

This operation is given by

$\text{GELU}\left(x\right)=\frac{x}{2}\left(1+\text{​}\text{erf}\left(\frac{x}{\sqrt{2}}\right)\right),$

where erf denotes the error function.

## Creation

### Description

example

layer = geluLayer returns a GELU layer.

layer = geluLayer(Name=Value) sets the optional Approximation and Name properties using name-value arguments. For example, geluLayer(Name="gelu") creates a GELU layer with the name "gelu".

## Properties

expand all

### GELU

Approximation method for the GELU operation, specified as one of these values:

• 'none' — Do not use approximation.

• 'tanh' — Approximate the underlying error function using

$\text{erf}\left(\frac{x}{\sqrt{2}}\right)\approx \text{tanh}\left(\sqrt{\frac{2}{\pi }}\left(x+0.044715{x}^{3}\right)\right).$

Tip

In MATLAB®, computing the tanh approximation is typically less accurate, and, for large input sizes, slower than computing the GELU activation without using an approximation. Use the tanh approximation when you want to reproduce models that use this approximation, such as BERT and GPT-2.

### Layer

Layer name, specified as a character vector or a string scalar. For Layer array input, the trainNetwork, assembleNetwork, layerGraph, and dlnetwork functions automatically assign names to layers with the name ''.

Data Types: char | string

Number of inputs of the layer. This layer accepts a single input only.

Data Types: double

Input names of the layer. This layer accepts a single input only.

Data Types: cell

Number of outputs of the layer. This layer has a single output only.

Data Types: double

Output names of the layer. This layer has a single output only.

Data Types: cell

## Examples

collapse all

Create a GELU layer.

layer = geluLayer
layer =
GELULayer with properties:

Name: ''

Hyperparameters
Approximation: 'none'

Include a GELU layer in a Layer array.

layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
geluLayer
maxPooling2dLayer(2,Stride=2)
fullyConnectedLayer(10)
softmaxLayer
classificationLayer]
layers =
7×1 Layer array with layers:

1   ''   Image Input             28×28×1 images with 'zerocenter' normalization
2   ''   Convolution             20 5×5 convolutions with stride [1  1] and padding [0  0  0  0]
3   ''   GELU                    GELU
4   ''   Max Pooling             2×2 max pooling with stride [2  2] and padding [0  0  0  0]
5   ''   Fully Connected         10 fully connected layer
6   ''   Softmax                 softmax
7   ''   Classification Output   crossentropyex

expand all

## References

[1] Hendrycks, Dan, and Kevin Gimpel. "Gaussian error linear units (GELUs)." Preprint, submitted June 27, 2016. https://arxiv.org/abs/1606.08415

## Version History

Introduced in R2022b