Main Content

nt2aa

Convert nucleotide sequence to amino acid sequence

Description

SeqAA = nt2aa(SeqNT) converts a nucleotide sequence to an amino acid sequence using the standard genetic code.

example

SeqAA = nt2aa(SeqNT,Name=Value) uses additional options specified by one or more name-value arguments.

example

Examples

collapse all

Generate a random DNA sequence.

ntSeq = randseq(30)
ntSeq = 
'TTATGACGTTATTCTACTTTGATTGTGCGA'

Convert the DNA sequence to an amino acid sequence using the standard genetic code.

aaSeq = nt2aa(ntSeq)
aaSeq = 
'L*RYSTLIVR'

Generate amino acid sequences for all three reading frames using the yeast mitochondrial genetic code.

aaSeq = nt2aa(ntSeq,Frame="all",GeneticCode=3)
aaSeq = 3x1 cell
    {'LWRYSTLIVR'}
    {'YDVITTWLC' }
    {'MTLFYFDCA' }

Input Arguments

collapse all

Nucleotide sequence, specified as one of the following.

Note

  • Hyphens are valid only if the codon to which it belongs represents a gap, that is, the codon contains all hyphens. For example, ACT---TGA.

  • Do not use a sequence with hyphens if you specify "all" for Frame.

Example: SeqAA = nt2aa("CGACTT") converts the nucleotide sequence to the amino acid sequence 'RL'.

Data Types: double | char | string | struct

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: SeqAA = nt2aa("CGACTT",Frame=2)

Reading frame, specified as 1, 2, 3, or "all". If you specify "all", the function outputs a 3-by-1 cell array containing the amino acid sequences for all three reading frames.

Example: SeqAA = nt2aa("AAGACT",Frame=3) converts the nucleotide sequence to an amino acid sequence using the third reading frame.

Data Types: double | char | string

Genetic code number or name, specified as an integer, character vector, or string scalar. This table lists valid genetic code numbers and names.

Genetic Code NumberGenetic Code Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

This table shows the nucleotide codon to amino acid mapping for the standard genetic code.

Amino Acid NameAmino Acid CodeNucleotide Codon
Alanine AGCT GCC GCA GCG
ArginineRCGT CGC CGA CGG AGA AGG
AsparagineNAAT AAC
Aspartic acid (Aspartate) DGAT GAC
CysteineCTGT TGC
GlutamineQCAA CAG
Glutamic acid (Glutamate) EGAA GAG
GlycineGGGT GGC GGA GGG
HistidineHCAT CAC
IsoleucineIATT ATC ATA
LeucineL

TTA TTG† CTT CTC CTA CTG†

† indicates an alternative start codon for the standard genetic code as defined here. If you are using nt2aa, alternative start codons are converted to methionine (M) when one of these codons is the first codon of a sequence and AlternativeStartCodons is set to true.

LysineKAAA AAG
MethionineMATG
PhenylalanineFTTT TTC
Proline PCCT CCC CCA CCG
SerineSTCT TCC TCA TCG AGT AGC
ThreonineTACT ACC ACA ACG
TryptophanWTGG
TyrosineYTAT TAC
ValineVGTT GTC GTA GTG
Asparagine or Aspartic acid (Aspartate) B Random codon from D and N
Glutamine or Glutamic acid (Glutamate) ZRandom codon from E and Q
Unknown amino acid (any amino acid) XRandom codon
Translation stop *TAA TAG TGA
Gap of indeterminate length ----
Unknown character (any character or symbol not in table) ????

Example: SeqAA = nt2aa("ACGTTA",GeneticCode=2) converts the nucleotide sequence using the vertebrate mitochondrial genetic code.

Data Types: double | char | string

Flag to translate alternative start codons, specified as true or false. When true, if the first codon of a sequence is a known alternative start codon, the function translates the codon to methionine (M). When false, the function translates the alternative start codon to its corresponding amino acid.

For more information on alternative start codons, visit https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi?mode=t#SG1.

Example: SeqAA = nt2aa("TTGATC",AlternativeStartCodons=true) converts the first codon to methionine (M) instead of leucine (L).

Data Types: logical

Flag to control the behavior of ambiguous nucleotides (R, Y, K, M, S, W, B, D, H, V, and N), specified as true or false. If you specify true, the function produces an error if any ambiguous nucleotides are present. If you specify false, the function tries to resolve any ambiguities. If it cannot, the function returns X for the affected codon.

Data Types: logical

Output Arguments

collapse all

Amino acid sequence, specified as one of the following.

  • If SeqNT is a character vector or string scalar, then the function returns a character vector.

  • If SeqNT is a row vector of integers, then the function returns a row vector of integers. For information on valid integers, see Mapping Amino Acid Letter Codes to Integers.

  • If SeqNT is a structure, then the function returns SeqAA with the same data type as the Sequence field, either a character vector or a row vector of integers.

Setting Frame to "all" directs the function to return a 3-by-1 cell array.

Version History

Introduced before R2006a