goannotread

Read annotations from Gene Ontology annotated file

Syntax

Annotation = goannotread(File)
Annotation = goannotread(File, ...'Fields', FieldsValue, ...)
Annotation = goannotread(File, ...'Aspect', AspectValue, ...)

Input Arguments

File

Character vector or string specifying a file name of a Gene Ontology (GO) annotated format (GAF) file.

FieldsValue

Character vector, string, string vector, or cell array of character vectors specifying one or more fields to read from the Gene Ontology annotated file. Default is to read all fields. Valid fields are listed below.

AspectValue

Character vector or string specifying one or more characters. Valid aspects are:

  • P — Biological process

  • F — Molecular function

  • C — Cellular component

Default is 'CFP', which specifies to read all aspects.

Output Arguments

AnnotationMATLAB® array of structures containing annotations from a Gene Ontology annotated file.

Description

Note

The goannotread function supports GAF 1.0 and 2.0 file formats.

Annotation = goannotread(File) converts the contents of File, a Gene Ontology annotated file, into Annotation, an array of structures. Files should have the structure specified by the Gene Ontology consortium, available at:

Annotation = goannotread(File, ...'PropertyName', PropertyValue, ...) calls goannotread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

Annotation = goannotread(File, ...'Fields', FieldsValue, ...) specifies the fields to read from the Gene Ontology annotated file. FieldsValue is a character vector, string, string vector, or cell array of character vectors specifying one or more fields. Default is to read all fields. Valid fields are:

  • Database

  • DB_Object_ID

  • DB_Object_Symbol

  • Qualifier

  • GOid

  • DBReference

  • Evidence

  • WithFrom

  • Aspect

  • DB_Object_Name

  • Synonym

  • DB_Object_Type

  • Taxon

  • Date

  • Assigned_by

Annotation = goannotread(File, ...'Aspect', AspectValue, ...) specifies the aspects to read from the Gene Ontology annotated file. AspectValue is a character vector or string specifying one or more characters. Valid aspects are:

  • P — Biological process

  • F — Molecular function

  • C — Cellular component

Default is 'CFP', which specifies to read all aspects.

Examples

collapse all

  1. Download gene_association.sgd.gz, the file containing GO annotations for the gene products of Saccharomyces cerevisiae, from the yeast genome website to your MATLAB current folder.

  2. Uncompress the file using the gunzip function.

    gunzip('gene_association.sgd.gz')
  3. Load the file.

    SGDGenes = goannotread('gene_association.sgd');
  4. Create a structure with GO annotations and display a list of the first five genes.

    S = struct2cell(SGDGenes);
    genes = S(3,1:5)'
    
    genes = 
    
        '15S_RRNA'
        '15S_RRNA'
        '15S_RRNA'
        '15S_RRNA'
        '21S_RRNA'
  5. You can limit the annotations to genes related to molecular function (F) and to the fields for the gene symbol and the associated ID, that is, DB_Object_Symbol and GOid.

    sgdSelect = goannotread('gene_association.sgd','Aspect','F','Fields',{'DB_Object_Symbol','GOid'})
    sgdSelect = 
    
      30701×1 struct array with fields:
    
        DB_Object_Symbol
        GOid
  6. Create a list of genes and the associated GO terms.

    selectGenes = {sgdSelect.DB_Object_Symbol};
    selectGO = [sgdSelect.GOid];
Introduced before R2006a