# Avoid Numerical Issues When Fitting Custom Distributions

This example shows how to use advanced techniques with the `mle`

function to avoid numerical issues when fitting a custom distribution. Specifically, you learn how to:

Specify adequate initial parameter values.

Specify

`logpdf`

(logarithm of probability density function) and`logsf`

(logarithm of survival function).Specify

`nloglf`

(negative loglikelihood function) and supply the gradient vector of the negative loglikelihood to the optimization function`fmincon`

(requires Optimization Toolbox™).

In this example, you fit an extreme value distribution to right-censored data. An extreme value distribution is often used to model failure times of mechanical parts. These types of experiments typically run for a fixed length of time only. If not all of the experimental units fail within that time, then the data values are right-censored, meaning some failure time values are not known exactly, but are known to be larger than a certain value.

Both the `evfit`

function and `mle`

function fit an extreme value distribution to data, including data with censoring. However, for the purposes of this example, use `mle`

and custom distributions to fit a model to censored data, using the extreme value distribution.

### Specify Adequate Initial Parameter Values

Because the values for the censored data are not known exactly, maximum likelihood estimation requires more information. In particular, the probability density function (pdf), the cumulative distribution function (cdf), and adequate initial parameter values are necessary to compute the loglikelihood. You can use the `evpdf`

and `evcdf`

functions to specify the pdf and cdf.

First, generate some uncensored extreme value data.

```
rng(0,'twister');
n = 50;
mu = 5;
sigma = 2.5;
x = evrnd(mu,sigma,n,1);
```

Next, censor any values that are larger than a predetermined cutoff by replacing those values with the cutoff value. This type of censoring is known as Type II censoring.

c = (x > 7); x(c) = 7;

Check the percentage of censored observations.

sum(c)/length(c)

ans = 0.1200

Twelve percent of the original data is right-censored with the cutoff at 7.

Plot a histogram of the data, including a stacked bar to represent the censored observations.

[uncensCnts,Edges] = histcounts(x(~c),10); censCnts = histcounts(x(c),Edges); bar(Edges(1:end-1)+diff(Edges)/2,[uncensCnts' censCnts'],'stacked') legend('Fully observed data','Censored data','Location','northwest')

Although the data includes censored observations, the fraction of censored observations is relatively small. Therefore, the method of moments can provide reasonable starting points for the parameter estimates. Compute the initial parameter values of `mu`

and `sigma`

that correspond to the observed mean and standard deviation of the uncensored data.

sigma0 = sqrt(6)*std(x(~c))./pi

sigma0 = 2.3495

mu0 = mean(x(~c))-psi(1).*sigma0

mu0 = 3.5629

Find the maximum likelihood estimates of the two extreme value distribution parameters, as well as the approximate 95% confidence intervals. Specify the censoring vector, pdf, cdf, and initial parameter values. Because `sigma`

(scale parameter) must be positive, you also need to specify lower parameter bounds.

[paramEsts,paramCIs] = mle(x,'Censoring',c, ... 'pdf',@evpdf,'cdf',@evcdf, ... 'Start',[mu0 sigma0],'LowerBound',[-Inf,0])

`paramEsts = `*1×2*
4.5530 3.0215

`paramCIs = `*2×2*
3.6455 2.2937
5.4605 3.7494

### Specify `logpdf`

and `logsf`

Fitting a custom distribution requires an initial guess for the parameters, and determining how good or bad a starting point is a priori can be difficult. If you specify a starting point that is farther away from the maximum likelihood estimates, some observations can be located far out in the tails of the extreme value distribution corresponding to the starting point. In a such case, one of these conditions can occur:

One of the pdf values becomes so small that it underflows to zero in double precision arithmetic.

One of the cdf values becomes so close to 1 that it rounds up in double precision.

A cdf value might also become so small as that it underflows, but this condition does not pose a problem.

Either condition can cause problems when `mle`

computes the loglikelihood, because each leads to loglikelihood values of `—Inf`

, which the optimization algorithm in `mle`

cannot handle.

Examine what happens with a different starting point.

start = [1 1]; try [paramEsts,paramCIs] = mle(x,'Censoring',c, ... 'pdf',@evpdf,'cdf',@evcdf, ... 'Start',start,'LowerBound',[-Inf,0]) catch ME disp(ME.message) end

Custom cumulative distribution function returned values greater than or equal to 1.

In this case, the second problem condition occurs. Some of the cdf values at the initial parameter guess are exactly 1, so the loglikelihood is infinite. You can try setting the `FunValCheck`

control parameter to `off`

by using the Options name-value argument. The `off`

option disables checking for nonfinite likelihood values. However, the best way to solve this numerical problem is at its root.

The extreme value cdf has the form

p = 1 - exp(-exp((x-mu)./sigma))

The contribution of the censored observations to the loglikelihood is the log of their survival function (SF) values, or `log(1-cdf)`

. For the extreme value distribution, the log of the SF is `-exp((x-mu)./sigma)`

. If you compute the loglikelihood using the log SF directly, instead of computing `log(1-(1-exp(logSF)))`

, you can avoid the rounding issues with the cdf. Observations whose cdf values are not distinguishable from `1`

in double precision have log SF values that are easily representable as nonzero values. For example, a cdf value of `(1-1e-20)`

rounds to `1`

in double precision, because double precision `eps`

is about `2e-16`

.

SFval = 1e-20; cdfval = 1 - SFval

cdfval = 1

The software can easily represent the log of the corresponding SF value.

log(SFval)

ans = -46.0517

The same situation is true for the log pdf; the contribution of uncensored observations to the loglikelihood is the log of their pdf values. You can use the log pdf directly, instead of computing `log(exp(logpdf))`

, to avoid underflow problems where the pdf is not distinguishable from zero in double precision. The software can easily represent the log pdf as a finite negative number. For example, a pdf value of `1e-400`

underflows in double precision, because double precision `realmin`

is about `2e-308`

.

logpdfval = -921; pdfval = exp(logpdfval)

pdfval = 0

Using the `mle`

function, you can specify a custom distribution with the log pdf and the log SF (rather than the pdf and cdf) by setting the `logpdf`

and `logsf`

name-value arguments. Unlike the pdf and cdf functions, log pdf and log SF do not have built-in functions. Therefore, you need to create anonymous functions that compute these values.

evlogpdf = @(x,mu,sigma) ((x-mu)./sigma - exp((x-mu)./sigma)) - log(sigma); evlogsf = @(x,mu,sigma) -exp((x-mu)./sigma);

Using the same starting point, the alternate log pdf and log SF specification of the extreme value distribution makes the problem solvable.

start = [1 1]; [paramEsts,paramCIs] = mle(x,'Censoring',c, ... 'logpdf',evlogpdf,'logsf',evlogsf, ... 'Start',start,'LowerBound',[-Inf,0])

`paramEsts = `*1×2*
4.5530 3.0215

`paramCIs = `*2×2*
3.6455 2.2937
5.4605 3.7494

This process does not always fix the problem of a poor starting point, so choosing the starting point carefully is recommended.

### Supply Gradient to Optimization Function `fmincon`

By default, `mle`

uses the function `fminsearch`

to find parameter values that maximize the loglikelihood for a custom distribution. `fminsearch`

uses an optimization algorithm that is derivative free, making it a good choice for this type of problem. However, for some problems, choosing an optimization algorithm that uses the derivatives of the loglikelihood function can make the difference between converging to the maximum likelihood estimates or not, especially when the starting point is far away from the final answer. Providing the derivatives can also speed up the convergence.

You can specify the `OptimFun`

name-value argument in `mle`

as `fmincon`

to use the `fmincon`

function (requires Optimization Toolbox). The `fmincon`

function includes optimization algorithms that can use derivative information. To take advantage of the algorithms in `fmincon`

, specify a custom distribution using a loglikelihood function, written to return not only the loglikelihood, but its gradient as well. The gradient of the loglikelihood function is the vector of its partial derivatives with respect to its parameters.

This strategy requires extra preparation to write code that computes both the loglikelihood and its gradient. Define a function named `helper_evnegloglike`

in a separate file.

function [nll,ngrad] = helper_evnegloglike(params,x,cens,freq) %HELPER_EVNEGLOGLIKE Negative log-likelihood for the extreme value % distribution. % This function supports only the example Avoid Numerical Issues When % Fitting Custom Distributions (customdist2demo.mlx) and might change in % a future release. if numel(params)~=2 error(message('stats:probdists:WrongParameterLength',2)); end mu = params(1); sigma = params(2); nunc = sum(1-cens); z = (x - mu) ./ sigma; expz = exp(z); nll = sum(expz) - sum(z(~cens)) + nunc.*log(sigma); if nargout > 1 ngrad = [-sum(expz)./sigma + nunc./sigma, ... -sum(z.*expz)./sigma + sum(z(~cens))./sigma + nunc./sigma]; end

The function `helper_evnegloglike`

returns the negative of both the loglikelihood values and the gradient values because `mle`

minimizes the negative loglikelihood.

To compute the maximum likelihood estimates using a gradient-based optimization algorithm, specify the `nloglf`

, `OptimFun`

, and `Options`

name-value arguments. `nloglf`

specifies the custom function that computes the negative loglikelihood, `OptimFun`

specifies `fmincon`

as the optimization function, and `Options`

specifies that `fmincon`

uses the second output of the custom function for the gradient.

start = [1 1]; [paramEsts,paramCIs] = mle(x,'Censoring',c,'nloglf',@helper_evnegloglike, ... 'Start',start,'LowerBound',[-Inf,0], ... 'OptimFun','fmincon','Options',statset('GradObj','on'))

`paramEsts = `*1×2*
4.5530 3.0215

`paramCIs = `*2×2*
3.6455 2.2937
5.4605 3.7493