Main Content

cpgisland

Locate CpG islands in DNA sequence

Syntax

cpgStruct = cpgisland(SeqDNA)
cpgStruct = cpgisland(SeqDNA, ...'Window', WindowValue, ...)
cpgStruct = cpgisland(SeqDNA, ...'MinIsland', MinIslandValue, ...)
cpgStruct = cpgisland(SeqDNA, ...'GCmin', GCminValue, ...)
cpgStruct = cpgisland(SeqDNA, ...'CpGoe', CpGoeValue, ...)
cpgStruct = cpgisland(SeqDNA, ...'Plot', PlotValue, ...)

Input Arguments

SeqDNA

One of the following:

  • Character vector or string specifying a nucleotide sequence

  • Row vector of integers specifying a nucleotide sequence

  • MATLAB® structure containing a Sequence field that contains a DNA nucleotide sequence, such as returned by fastaread, fastqread, emblread, getembl, genbankread, or getgenbank

Valid characters include A, C, G, and T.

cpgisland does not count ambiguous nucleotides or gaps.

WindowValueInteger specifying the window size for calculating GC content and CpGobserved/CpGexpected ratios. Default is 100 bases. A smaller window size increases the noise in a plot.
MinIslandValueInteger specifying the minimum number of consecutive marked bases to report as a CpG island. Default is 200 bases.
GCminValueValue specifying the minimum GC percent in a window needed to mark a base. Choices are a value between 0 and 1. Default is 0.5.
CpGoeValue

Value specifying the minimum CpGobserved/CpGexpected ratio in each window needed to mark a base. Choices are a value between 0 and 1. Default is 0.6. This ratio is defined as:

CPGobs/CpGexp = (NumCpGs*Length)/(NumGs*NumCs)
PlotValueControls the plotting of GC content, CpGoe content, CpG islands greater than the minimum island size, and all potential CpG islands for the specified criteria. Choices are true or false (default).

Output Arguments

cpgStructMATLAB structure containing the starting and ending bases of the CpG islands greater than the minimum island size.

Description

cpgStruct = cpgisland(SeqDNA) searches SeqDNA, a DNA nucleotide sequence, for CpG islands with a GC content greater than 50% and a CpGobserved/CpGexpected ratio greater than 60%. It marks bases meeting this criteria within a moving window of 100 DNA bases and then returns the results in cpgStruct, a MATLAB structure containing the starting and ending bases of the CpG islands greater than the minimum island size of 200 bases.

cpgStruct = cpgisland(SeqDNA, ...'PropertyName', PropertyValue, ...) calls cpgisland with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

cpgStruct = cpgisland(SeqDNA, ...'Window', WindowValue, ...) specifies the window size for calculating GC content and CpGobserved/CpGexpected ratios. Default is 100 bases. A smaller window size increases the noise in a plot.

cpgStruct = cpgisland(SeqDNA, ...'MinIsland', MinIslandValue, ...) specifies the minimum number of consecutive marked bases to report as a CpG island. Default is 200 bases.

cpgStruct = cpgisland(SeqDNA, ...'GCmin', GCminValue, ...) specifies the minimum GC percent in a window needed to mark a base. Choices are a value between 0 and 1. Default is 0.5.

cpgStruct = cpgisland(SeqDNA, ...'CpGoe', CpGoeValue, ...) specifies the minimum CpGobserved/CpGexpected ratio in each window needed to mark a base. Choices are a value between 0 and 1. Default is 0.6. This ratio is defined as:

CPGobs/CpGexp = (NumCpGs*Length)/(NumGs*NumCs)

cpgStruct = cpgisland(SeqDNA, ...'Plot', PlotValue, ...) controls the plotting of GC content, CpGoe content, CpG islands greater than the minimum island size, and all potential CpG islands for the specified criteria. Choices are true or false (default).

Examples

  1. Import a nucleotide sequence from the GenBank® database. For example, retrieve a sequence from Homo sapiens chromosome 12.

    S = getgenbank('AC156455');
  2. Calculate the CpG islands in the sequence and plot the results.

    cpgisland(S.Sequence,'PLOT',true)
    
    ans = 
       
        Starts: [4510 29359]
         Stops: [5468 29604]
    

    The CpG islands greater than 200 bases in length are listed and a plot displays.

Version History

Introduced before R2006a