nt2aa
Convert nucleotide sequence to amino acid sequence
Syntax
SeqAA
= nt2aa(SeqNT
)
SeqAA
= nt2aa(...,
'Frame', FrameValue
, ...)
SeqAA
= nt2aa(..., 'GeneticCode', GeneticCodeValue
, ...)
SeqAA
= nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue
,
...)
SeqAA
= nt2aa(..., 'ACGTOnly', ACGTOnlyValue
,
...)
Input Arguments
SeqNT | One of the following:
Note Hyphens are valid only if the codon to which it belongs represents
a gap, that is, the codon contains all hyphens. Example: Tip Do not use a sequence with hyphens if you specify |
FrameValue | Integer, character vector, or string specifying a reading frame in the nucleotide
sequence. Choices are If |
GeneticCodeValue | Integer, character vector, or string specifying a genetic code
number or code name from the table Genetic Code. Default is Tip If you use a code name, you can truncate the name to the first two letters of the name. |
AlternativeStartCodonsValue | Controls the translation of alternative codons. Choices are |
ACGTOnlyValue | Controls the behavior of ambiguous nucleotide characters
(
|
Output Arguments
SeqAA | Amino acid sequence specified by a character vector of single-letter codes. |
Description
converts
a nucleotide sequence, specified by SeqAA
= nt2aa(SeqNT
)SeqNT
,
to an amino acid sequence, returned in SeqAA
,
using the standard genetic code.
calls SeqAA
= nt2aa(SeqNT
,
...'PropertyName
', PropertyValue
,
...)nt2aa
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows:
converts
a nucleotide sequence for a specific reading frame to an amino acid
sequence. Choices are SeqAA
= nt2aa(...,
'Frame', FrameValue
, ...)1
, 2
, 3
,
or 'all'
. Default is 1
. If FrameValue
is 'all'
,
then output SeqAA
is a 3-by-1 cell array.
specifies a genetic code to use when converting a nucleotide sequence to an amino acid
sequence. SeqAA
= nt2aa(..., 'GeneticCode', GeneticCodeValue
, ...)GeneticCodeValue
can be an integer, character vector, or
string specifying a code number or code name from the table Genetic Code. Default is 1
or 'Standard'
. The amino acid to
nucleotide codon mapping for the Standard genetic code is shown in the table Standard Genetic Code.
Tip
If you use a code name, you can truncate the name to the first two letters of the name.
controls the translation of alternative start codons. SeqAA
= nt2aa(..., 'AlternativeStartCodons', AlternativeStartCodonsValue
,
...)
When this option is true
and the first codon of a sequence corresponds
to a known alternative start codon, the function translates the codon to methionine. If this
option is false
, the function translates an alternative start codon at the
start of a sequence to its corresponding amino acid in the genetic code that you specify,
which might not necessarily be methionine. For example, in the human mitochondrial genetic
code, AUA
and AUU
are known to be alternative start
codons. For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.
For more information about alternative start codons, see:
Genetic Code
Code Number | Code Name |
---|---|
1 | Standard |
2 | Vertebrate Mitochondrial |
3 | Yeast Mitochondrial |
4 | Mold , Protozoan , Coelenterate
Mitochondrial , and Mycoplasma/Spiroplasma |
5 | Invertebrate Mitochondrial |
6 | Ciliate , Dasycladacean ,
and Hexamita Nuclear |
9 | Echinoderm Mitochondrial |
10 | Euplotid Nuclear |
11 | Bacterial and Plant Plastid |
12 | Alternative Yeast Nuclear |
13 | Ascidian Mitochondrial |
14 | Flatworm Mitochondrial |
15 | Blepharisma Nuclear |
16 | Chlorophycean Mitochondrial |
21 | Trematode Mitochondrial |
22 | Scenedesmus Obliquus Mitochondrial |
23 | Thraustochytrium Mitochondrial |
Standard Genetic Code
Amino Acid Name | Amino Acid Code | Nucleotide Codon |
---|---|---|
Alanine | A | GCT GCC GCA GCG |
Arginine | R | CGT CGC CGA CGG AGA AGG |
Asparagine | N | AAT AAC |
Aspartic acid (Aspartate) | D | GAT GAC |
Cysteine | C | TGT TGC |
Glutamine | Q | CAA CAG |
Glutamic acid (Glutamate) | E | GAA GAG |
Glycine | G | GGT GGC GGA GGG |
Histidine | H | CAT CAC |
Isoleucine | I | ATT ATC ATA |
Leucine | L |
† indicates alternative start
codon for the Standard Genetic Code as defined here. If you are using |
Lysine | K | AAA AAG |
Methionine | M | ATG |
Phenylalanine | F | TTT TTC |
Proline | P | CCT CCC CCA CCG |
Serine | S | TCT TCC TCA TCG AGT AGC |
Threonine | T | ACT ACC ACA ACG |
Tryptophan | W | TGG |
Tyrosine | Y | TAT TAC |
Valine | V | GTT GTC GTA GTG |
Asparagine or Aspartic acid (Aspartate) | B | Random codon from D and N |
Glutamine or Glutamic acid (Glutamate) | Z | Random codon from E and Q |
Unknown amino acid (any amino acid) | X | Random codon |
Translation stop | * | TAA TAG TGA |
Gap of indeterminate length | - | --- |
Unknown character (any character or symbol not in table) | ? | ??? |
controls the behavior of ambiguous nucleotide characters
(SeqAA
= nt2aa(..., 'ACGTOnly', ACGTOnlyValue
,
...)R
, Y
, K
, M
, S
, W
, B
, D
, H
, V
,
and N
) and unknown characters. ACGTOnlyValue
can
be true
(default) or false
.
If true
, then the function errors if any of these
characters are present. If false
, then the function
tries to resolve ambiguities. If it cannot, it returns X
for
the affected codon.
Examples
Use the
getgenbank
function to retrieve genomic information for the human mitochondrion from the GenBank® database and store it in a MATLAB structure.mitochondria = getgenbank('NC_012920')
mitochondria = LocusName: 'NC_012920' LocusSequenceLength: '16569' LocusNumberofStrands: '' LocusTopology: 'circular' LocusMoleculeType: 'DNA' LocusGenBankDivision: 'PRI' LocusModificationDate: '05-MAR-2010' Definition: 'Homo sapiens mitochondrion, complete genome.' Accession: 'NC_012920 AC_000021' Version: 'NC_012920.1' GI: '251831106' Project: [] DBLink: 'Project:30353' Keywords: [] Segment: [] Source: 'mitochondrion Homo sapiens (human)' SourceOrganism: [4x65 char] Reference: {1x7 cell} Comment: [24x67 char] Features: [933x74 char] CDS: [1x13 struct] Sequence: [1x16569 char] SearchURL: [1x70 char] RetrieveURL: [1x104 char]
Determine the name and location of the first gene in the human mitochondrion.
mitochondria.CDS(1).gene
ans = ND1
mitochondria.CDS(1).location
ans = 3307..4262
Extract the sequence for the ND1 gene from the nucleotide sequence.
ND1gene = mitochondria.Sequence(3307:4262);
Convert the ND1 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.
protein1 = nt2aa(ND1gene,'GeneticCode', 2);
Use the
getgenpept
function to retrieve the same amino acid sequence from the GenPept database.protein2 = getgenpept('YP_003024026', 'SequenceOnly', true);
Use the
isequal
function to compare the two amino acid sequences.isequal (protein1, protein2) ans = 1
Use the
getgenbank
function to retrieve the nucleotide sequence for the human mitochondrion from the GenBank database.mitochondria = getgenbank('NC_012920');
Determine the name and location of the second gene in the human mitochondrion.
mitochondria.CDS(2).gene
ans = ND2
mitochondria.CDS(2).location
ans = 4470..5511
Extract the sequence for the ND2 gene from the nucleotide sequence.
ND2gene = mitochondria.Sequence(4470:5511);
Convert the ND2 gene on the human mitochondria genome to an amino acid sequence using the Vertebrate Mitochondrial genetic code.
protein1 = nt2aa(ND2gene,'GeneticCode', 2);
Note
In the
ND2gene
nucleotide sequence, the first codon isATT
, which is translated toM
, while the subsequentATT
codons are translated toI
. If you set'AlternativeStartCodons'
tofalse
, then the firstATT
codon is translated toI
, the corresponding amino acid in the Vertebrate Mitochondrial genetic code.Use the
getgenpept
function to retrieve the same amino acid sequence from the GenPept database.protein2 = getgenpept('YP_003024027', 'SequenceOnly', true);
Use the
isequal
function to compare the two amino acid sequences.isequal (protein1, protein2) ans = 1
If you have a sequence with ambiguous or unknown nucleotide
characters, you can set the 'ACGTOnly'
property
to false
to have the nt2aa
function
try to resolve them:
nt2aa('agttgccgacgcgcncar','ACGTOnly', false) ans = SCRRAQ
Version History
Introduced before R2006a
See Also
aa2nt
| aminolookup
| baselookup
| codonbias
| dnds
| dndsml
| geneticcode
| isotopicdist
| revgeneticcode
| seqviewer