# multialign

Align multiple sequences using progressive method

## Syntax

## Description

performs a progressive multiple alignment for a set of sequences. `SeqsMultiAligned`

= multialign(`Seqs`

)

Pairwise distances between sequences are computed after pairwise alignment with the Gonnet scoring matrix and then by counting the proportion of sites at which each pair of sequences are different (ignoring gaps). The guide tree is calculated by the neighbor-joining method assuming equal variance and independence of evolutionary distance estimates.

uses a tree as a guide for the progressive alignment. The sequences should have the
same order as the leaves in the tree or use a field (`SeqsMultiAligned`

= multialign(`Seqs`

,`Tree`

)`"Header"`

or
`"Name"`

) to identify the sequences.

uses additional options specified by one or more name-value arguments.`SeqsMultiAligned`

= multialign(___,`Name=Value`

)

## Examples

### Align multiple sequences

This example shows how to align multiple protein sequences.

Use the `fastaread`

function to read p53samples.txt, a FASTA-formatted file included with Bioinformatics Toolbox™, which contains p53 protein sequences of seven species.

`p53 = fastaread('p53samples.txt')`

`p53=`*7×1 struct array with fields:*
Header
Sequence

Compute the pairwise distances between each pair of sequences using the 'GONNET' scoring matrix.

dist = seqpdist(p53,'ScoringMatrix','GONNET');

Build a phylogenetic tree using an unweighted average distance (UPGMA) method. This tree will be used as a guiding tree in the next step of progressive alignment.

`tree = seqlinkage(dist,'average',p53)`

Phylogenetic tree object with 7 leaves (6 branches)

Perform progressive alignment using the PAM family scoring matrices.

ma = multialign(p53,tree,'ScoringMatrix',... {'pam150','pam200','pam250'})

`ma=`*7×1 struct array with fields:*
Header
Sequence

### Align Nucleotide Sequences

Enter an array of sequences.

seqs = {'CACGTAACATCTC','ACGACGTAACATCTTCT','AAACGTAACATCTCGC'};

Promote terminations with gaps in the alignment.

`multialign(seqs,'terminalGapAdjust',true)`

`ans = `*3x17 char array*
'--CACGTAACATCTC--'
'ACGACGTAACATCTTCT'
'-AAACGTAACATCTCGC'

Compare the alignment without termination gap adjustment.

multialign(seqs)

`ans = `*3x17 char array*
'CA--CGTAACATCT--C'
'ACGACGTAACATCTTCT'
'AA-ACGTAACATCTCGC'

## Input Arguments

`Seqs`

— Nucleotide or amino acid sequences

cell array of character vectors | vector of strings | matrix of characters | vector of structures

Nucleotide or amino acid sequences, specified as a cell array of character vectors, vector of strings, matrix of characters, or vector of structures.

You can specify:

Cell array of character vectors or vector of strings containing nucleotide or amino acid sequences.

Matrix of characters, in which each row corresponds to a nucleotide or amino acid sequence.

Vector of structures containing a

`Sequence`

field for the residues and a`Header`

or`Name`

field for the labels.

`Tree`

— Phylogenetic tree

`phytree`

object

Phylogenetic tree, specified as a `phytree`

object. You can calculate the
tree using the `seqlinkage`

or `seqneighjoin`

function.

### Name-Value Arguments

Specify optional pairs of arguments as
`Name1=Value1,...,NameN=ValueN`

, where `Name`

is
the argument name and `Value`

is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.

**Example: **`SeqsMultiAligned = multialign(Seqs,Weights="equal")`

assigns the same weight to every sequence.

*
Before R2021a, use commas to separate each name and value, and enclose*
`Name`

*in quotes.*

**Example: **```
SeqsMultiAligned =
multialign(Seqs,"Weights","equal")
```

`Weights`

— Sequence weighting method

`"THG"`

(default) | `"equal"`

Sequence weighting method, specified as `"THG"`

or
`"equal"`

. Weights emphasize highly divergent
sequences by scaling the scoring matrix and gap penalties. Closer
sequences receive smaller weights.

`"THG"`

— Thompson-Higgins-Gibson method using the phylogenetic tree branch distances weighted by their thickness.`"equal"`

— Assigns the same weight to every sequence.

`ScoringMatrix`

— Scoring matrix for progressive alignment

character vector | string scalar | cell array of character vectors | array of strings | numeric matrix | numeric array

Scoring matrix for the progressive alignment, specified as a character vector, string scalar, or numeric matrix. You can specify a series of scoring matrices as a cell array of character vectors, array of strings, or numeric array.

Match and mismatch scores are interpolated from the series of scoring matrices by considering the distances between the two profiles or sequences being aligned. The first matrix corresponds to the smallest distance, and the last matrix to the largest distance. Intermediate distances are calculated using linear interpolation.

You can specify scoring matrix names. Valid choices are:

`"BLOSUM62"`

`"BLOSUM30"`

increasing by`5`

up to`"BLOSUM90"`

(default for amino acid sequences is the`"BLOSUM80"`

to`"BLOSUM30"`

series)`"BLOSUM100"`

`"PAM10"`

increasing by`10`

up to`"PAM500"`

`"DAYHOFF"`

`"GONNET"`

`"NUC44"`

(default for nucleotide sequences). This choice is not supported for amino acid sequences.

**Note**

The above scoring matrices, provided with the software, also include a scale factor that converts the units of the output score to bits.

You can also specify a numeric matrix of size
*M*-by-*M*, such as the one
returned by the `blosum`

, `pam`

, `dayhoff`

, `gonnet`

, or `nuc44`

function.
You can also specify a numeric array of size
*M*-by-*M*-by-*N*
for a series of *N* user-defined scoring
matrices.

**Note**

If you use a scoring matrix that you created or was created by one of the above functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. When passing your own series of scoring matrices, ensure they share the same scale.

If you need to compile

`multialign`

into a standalone application or software component using MATLAB^{®}Compiler™, use a numeric matrix instead of the scoring matrix name.

**Example: **`"BLOSUM62"`

or `'BLOSUM62'`

specifies a BLOSUM scoring matrix with a percent identity level of 62,
and includes a scale factor.

**Example: **`["pam150","pam200","pam250']`

or
`{'pam150','pam200','pam250'}`

specifies a series
of three PAM scoring matrices.

**Example: **`blosum(62)`

specifies the numeric matrix
returned by the `blosum`

function, and does not include
a scale factor.

`SMInterp`

— Use linear interpolation of scoring matrices

`true`

or
`1`

(default) | `false`

or `0`

Use linear interpolation of the scoring matrices, specified as a
numeric or logical `true`

(`1`

) or
`false`

(`0`

). When
`SMInterp`

is `false`

, each
scoring matrix is assigned to a fixed range depending on the distances
between the two profiles or sequences being aligned.

`GapOpen`

— Initial penalty for opening gap

positive scalar | function handle

Initial penalty for opening a gap, specified as a positive scalar or a function handle.

If you enter a function, `multialign`

passes four
values to the function: the average score for two matched residues
(`sm`

), the average score for two mismatched
residues (`sx`

), and, the length of both profiles or
sequences (`len1`

, `len2`

). By
default, `multialign`

uses the function handle
`@(sm,sx,len1,len2) 5*sm`

, which sets the initial
penalty for opening the gap at five times the average score for two
matched residuals. Although the default function does not depend on
`sx`

, `len1`

, or
`len2`

, your custom function can use these
values.

**Data Types: **`double`

`ExtendGap`

— Initial penalty for extending gap

positive scalar | function handle

Initial penalty for extending a gap, specified as a positive scalar or
a function handle. If you specify this value, the function uses the
affine gap penalty scheme, that is, it scores the first gap using the
`GapOpen`

value and scores subsequent gaps using
the `ExtendGap`

value. If you do not specify this
value, the function scores all gaps equally, using the
`GapOpen`

penalty.

If you enter a function, `multialign`

passes four
values to the function: the average score for two matched residues
(`sm`

), the average score for two mismatched
residues (`sx`

), and, the length of both profiles or
sequences (`len1`

, `len2`

). By
default, `multialign`

uses the function handle
`@(sm,sx,len1,len2) sm/4`

, which sets the initial
penalty for extending the gap at one-fourth the average score for two
matched residuals. Although the default function does not depend on
`sx`

, `len1`

, or
`len2`

, your custom function can use these
values.

**Data Types: **`double`

`DelayCutoff`

— Threshold delay of divergent sequences

numeric scalar

Threshold delay of divergent sequences, specified as a numeric scalar.
The `multialign`

function delays the alignment of
divergent sequences whose closest neighbor is farther than:

`(``DelayCutoff`

) * (median patristic distance between sequences)

The default value is unity, where sequences with the closest sequence farther than the median distance are delayed.

`UseParallel`

— Use parallel computation

`false`

or
`0`

(default) | `true`

or `1`

Use parallel computation of the pairwise alignments, specified as a
numeric or logical `false`

(`0`

) or
`true`

(`1`

).

If

`true`

, and Parallel Computing Toolbox™ is installed, then computation occurs using`parfor`

-loops.If a

`parpool`

is open, then the computation uses the open`parpool`

and occurs in parallel.If there are no open

`parpool`

, but automatic creation is enabled in the Parallel Preferences, then the default pool will be automatically opened and computation occurs in parallel.If there are no open

`parpool`

and automatic creation is disabled, then computation uses`parfor`

-loops in serial mode.

If Parallel Computing Toolbox is not installed, then computation uses

`parfor`

-loops in serial mode.If

`false`

, then the computation uses for-loops in serial mode.

`Verbose`

— Display sequences with sequence information

`false`

or
`0`

(default) | `true`

or `1`

Display the sequences with sequence information, specified as a
numeric or logical `false`

(`0`

) or
`true`

(`1`

).

`ExistingGapAdjust`

— Control automatic adjustment based on existing gaps

`true`

or
`1`

(default) | `false`

or `0`

Control automatic adjustment based on existing gaps, specified as a
numeric or logical `true`

(`1`

) or
`false`

(`0`

).

When `true`

, for every profile position,
`multialign`

proportionally lowers the penalty
for opening a gap toward the penalty of extending a gap based on the
proportion of gaps found in the contiguous symbols and on the weight of
the input profile.

When `false`

, turns off the automatic adjustment
based on existing gaps of the position-specific penalties for opening a
gap.

This argument is analogous to the function `profalign`

and is used through every step of the
progressive alignment of profiles.

`TerminalGapAdjust`

— Adjust penalty for opening gap at ends of sequence

`false`

or
`0`

(default) | `true`

or `1`

Adjust the penalty for opening a gap at the ends of the sequence,
specified as a numeric or logical `false`

(`0`

) or `true`

(`1`

). When `true`

, the
`multialign`

function adjusts the penalty for
opening a gap at the ends of the sequence to be equal to the penalty for
extending a
gap.

This argument is analogous to the function `profalign`

and is used through every step of the
progressive alignment of profiles.

## Output Arguments

`SeqsMultiAligned`

— Aligned sequences

cell array of character vectors | vector of strings | matrix of characters | vector of structures

Aligned sequences, returned as a cell array of character vectors, vector
of strings, matrix of characters, or vector of structures. The format of
`SeqsMultiAligned`

matches the format of the input
sequences to align, `Seqs`

.

When

`Seqs`

is a cell array of character vectors, vector of strings, or matrix of characters, the output alignment in`SeqsMultiAligned`

follows the same order as the input.When

`Seqs`

is a vector of structures, the`Sequence`

field of`SeqsMultiAligned`

is updated with the alignment. Other fields of`SeqsMultiAligned`

match the fields of`Seq`

.

## Extended Capabilities

### Automatic Parallel Support

Accelerate code by automatically running computation in parallel using Parallel Computing Toolbox™.

To run in parallel, set `'UseParallel'`

to `true`

.

For more information, see the `'UseParallel'`

name-value pair argument.

## Version History

**Introduced before R2006a**

## See Also

`align2cigar`

| `hmmprofalign`

| `multialignread`

| `multialignwrite`

| `nwalign`

| `profalign`

| `seqprofile`

| `seqconsensus`

| `seqneighjoin`

## MATLAB 명령

다음 MATLAB 명령에 해당하는 링크를 클릭했습니다.

명령을 실행하려면 MATLAB 명령 창에 입력하십시오. 웹 브라우저는 MATLAB 명령을 지원하지 않습니다.

Select a Web Site

Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .

You can also select a web site from the following list:

## How to Get Best Site Performance

Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.

### Americas

- América Latina (Español)
- Canada (English)
- United States (English)

### Europe

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)