Working with Objects for Microarray Experiment Data
This example shows how to create and manipulate MATLAB® containers designed for storing data from a microarray experiment.
Containers for Gene Expression Experiment Data
Microarray experimental data are very complex, usually consisting of data and information from a number of different sources. Storing and managing the large and complex data sets in a coherent manner is a challenge. Bioinformatics Toolbox™ provides a set of objects to represent the different pieces of data from a microarray experiment.
The ExpressionSet
class is a single, convenient data structure for storing and managing different types of data from a microarray gene expression experiment.
An ExpressionSet
object consists of these four components that are common to all microarray gene expression experiments:
Experiment data: Expression values from microarray experiments. These data are stored as an instance of the ExptData
class.
Sample information: The metadata describing the samples in the experiment. The sample metadata are stored as an instance of the MetaData
class.
Array feature annotations: The annotations about the features or probes on the array used in the experiment. The annotations can be stored as an instance of the MetaData
class.
Experiment descriptions: Information to describe the experiment methods and conditions. The information can be stored as an instance of the MIAME
class.
The ExpressionSet
class coordinates and validates these data components. The class provides methods for retrieving and setting the data stored in an ExpressionSet
object. An ExpressionSet
object also behaves like many other MATLAB data structures that can be subsetted and copied.
Experiment Data
In a microarray gene expression experiment, the measured expression values for each feature per sample can be represented as a two-dimensional matrix. The matrix has F rows and S columns, where F is the number of features on the array, and S is the number of samples on which the expression values were measured. A DataMatrix
object is a two-dimensional matrix that you can index by row and column numbers, logical vectors, or row and column names.
Create a Datamatrix with row and column names.
dm = bioma.data.DataMatrix(rand(5,4), 'RowNames','Feature', 'ColNames', 'Sample')
dm = Sample1 Sample2 Sample3 Sample4 Feature1 0.81472 0.09754 0.15761 0.14189 Feature2 0.90579 0.2785 0.97059 0.42176 Feature3 0.12699 0.54688 0.95717 0.91574 Feature4 0.91338 0.95751 0.48538 0.79221 Feature5 0.63236 0.96489 0.80028 0.95949
The function size
returns the number of rows and columns in a DataMatrix
object.
size(dm)
ans = 5 4
You can index into a DataMatrix
object like other MATLAB numeric arrays by using row and column numbers. For example, you can access the elements at rows 1 and 2, column 3.
dm(1:2, 3)
ans = Sample3 Feature1 0.15761 Feature2 0.97059
You can also index into a DataMatrix
object by using its row and column names. Reassign the elements in row 2 and 3, column 1 and 4 to different values.
dm({'Feature2', 'Feature3'}, {'Sample1', 'Sample4'}) = [2, 3; 4, 5]
dm = Sample1 Sample2 Sample3 Sample4 Feature1 0.81472 0.09754 0.15761 0.14189 Feature2 2 0.2785 0.97059 3 Feature3 4 0.54688 0.95717 5 Feature4 0.91338 0.95751 0.48538 0.79221 Feature5 0.63236 0.96489 0.80028 0.95949
The gene expression data used in this example is a small set of data from a microarray experiment profiling adult mouse gene expression patterns in common strains on the Affymetrix® MG-U74Av2 array [1].
Read the expression values from the tab-formatted file mouseExprsData.txt
into MATLAB Workspace as a DataMatrix
object.
exprsData = bioma.data.DataMatrix('file', 'mouseExprsData.txt'); class(exprsData)
ans = 'bioma.data.DataMatrix'
Get the properties of the DataMatrix
object, exprsData
.
get(exprsData)
Name: 'mouseExprsData' RowNames: {500x1 cell} ColNames: {1x26 cell} NRows: 500 NCols: 26 NDims: 2 ElementClass: 'double'
Check the sample names.
colnames(exprsData)
ans = 1x26 cell array Columns 1 through 8 {'A'} {'B'} {'C'} {'D'} {'E'} {'F'} {'G'} {'H'} Columns 9 through 16 {'I'} {'J'} {'K'} {'L'} {'M'} {'N'} {'O'} {'P'} Columns 17 through 24 {'Q'} {'R'} {'S'} {'T'} {'U'} {'V'} {'W'} {'X'} Columns 25 through 26 {'Y'} {'Z'}
View the first 10 rows and 5 columns.
exprsData(1:10, 1:5)
ans = A B C D E 100001_at 2.26 20.14 31.66 14.58 16.04 100002_at 158.86 236.25 206.27 388.71 388.09 100003_at 68.11 105.45 82.92 82.9 60.38 100004_at 74.32 96.68 84.87 72.26 98.38 100005_at 75.05 53.17 57.94 60.06 63.91 100006_at 80.36 42.89 77.21 77.24 40.31 100007_at 216.64 191.32 219.48 237.28 298.18 100009_r_at 3806.7 1425 2468.5 2172.7 2237.2 100010_at NaN NaN NaN 7.18 22.37 100011_at 81.72 72.27 127.61 91.01 98.13
Perform a log2 transformation of the expression values.
exprsData_log2 = log2(exprsData); exprsData_log2(1:10, 1:5)
ans = A B C D E 100001_at 1.1763 4.332 4.9846 3.8659 4.0036 100002_at 7.3116 7.8842 7.6884 8.6026 8.6002 100003_at 6.0898 6.7204 6.3736 6.3733 5.916 100004_at 6.2157 6.5951 6.4072 6.1751 6.6203 100005_at 6.2298 5.7325 5.8565 5.9083 5.998 100006_at 6.3284 5.4226 6.2707 6.2713 5.3331 100007_at 7.7592 7.5798 7.7779 7.8904 8.22 100009_r_at 11.894 10.477 11.269 11.085 11.127 100010_at NaN NaN NaN 2.844 4.4835 100011_at 6.3526 6.1753 6.9956 6.508 6.6166
Change the Name
property to be more descriptive|.
exprsData_log2 = set(exprsData_log2, 'Name', 'Log2 Based mouseExprsData'); get(exprsData_log2)
Name: 'Log2 Based mouseExprsData' RowNames: {500x1 cell} ColNames: {1x26 cell} NRows: 500 NCols: 26 NDims: 2 ElementClass: 'double'
In a microarray experiment, the data set often contains one or more matrices that have the same number of rows and columns and identical row names and column names. ExptData
class is designed to contain and coordinate one or more data matrices having identical row and column names with the same dimension size. The data values are stored as DataMatrix
objects. Each DataMatrix
object is an element of an ExptData
object. The ExptData
class is responsible for data validation and coordination between these DataMatrix
objects.
Store the gene expression data of natural scale and log2 base expression values separately in an ExptData
object.
mouseExptData = bioma.data.ExptData(exprsData, exprsData_log2,... 'ElementNames', {'naturalExprs', 'log2Exprs'})
mouseExptData = Experiment Data: 500 features, 26 samples 2 elements Element names: naturalExprs, log2Exprs
Access a DataMatrix
element in mouseExptData
using the element name.
exprsData2 = mouseExptData('log2Exprs');
get(exprsData2)
Name: 'Log2 Based mouseExprsData' RowNames: {500x1 cell} ColNames: {1x26 cell} NRows: 500 NCols: 26 NDims: 2 ElementClass: 'double'
Sample Metadata
The metadata about the samples in a microarray experiment can be represented as a table with S rows and V columns, where S is the number of samples, and V is the number of variables. The contents of the table are the values of each variable for each sample. For example, the file mouseSampleData.txt
contains such a table. The description of each sample variable is marked by a # symbol.
The MetaData
class is designed for storing and manipulating variable values and their metadata in a coordinated fashion. You can read the mouseSampleData.txt
file into MATLAB as a MetaData
object.
sData = bioma.data.MetaData('file', 'mouseSampleData.txt', 'vardescchar', '#')
sData = Sample Names: A, B, ...,Z (26 total) Variable Names and Meta Information: VariableDescription Gender {' Gender of the mouse in study' } Age {' The number of weeks since mouse birth'} Type {' Genetic characters' } Strain {' The mouse strain' } Source {' The tissue source for RNA collection' }
The properties of MetaData
class provide information about the samples and variables.
numSamples = sData.NSamples numVariables = sData.NVariables
numSamples = 26 numVariables = 5
The variable values and the variable descriptions for the samples are stored as two dataset
arrays in a MetaData
class. The MetaData
class provides access methods to the variable values and the meta information describing the variables.
Access the sample metadata using the variableValues
method.
sData.variableValues
ans = Gender Age Type Strain A {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} B {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} C {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} D {'Male'} 8 {'Wild type'} {'A/J ' } E {'Male'} 8 {'Wild type'} {'A/J ' } F {'Male'} 8 {'Wild type'} {'C57BL/6J ' } G {'Male'} 8 {'Wild type'} {'C57BL/6J' } H {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} I {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} J {'Male'} 8 {'Wild type'} {'A/J' } K {'Male'} 8 {'Wild type'} {'A/J' } L {'Male'} 8 {'Wild type'} {'A/J' } M {'Male'} 8 {'Wild type'} {'C57BL/6J' } N {'Male'} 8 {'Wild type'} {'C57BL/6J' } O {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} P {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} Q {'Male'} 8 {'Wild type'} {'A/J' } R {'Male'} 8 {'Wild type'} {'A/J' } S {'Male'} 8 {'Wild type'} {'C57BL/6J' } T {'Male'} 8 {'Wild type'} {'C57BL/6J4' } U {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} V {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} W {'Male'} 8 {'Wild type'} {'A/J' } X {'Male'} 8 {'Wild type'} {'A/J' } Y {'Male'} 8 {'Wild type'} {'C57BL/6J' } Z {'Male'} 8 {'Wild type'} {'C57BL/6J' } Source A {'amygdala' } B {'amygdala' } C {'amygdala' } D {'amygdala' } E {'amygdala' } F {'amygdala' } G {'amygdala' } H {'cingulate cortex'} I {'cingulate cortex'} J {'cingulate cortex'} K {'cingulate cortex'} L {'cingulate cortex'} M {'cingulate cortex'} N {'cingulate cortex'} O {'hippocampus' } P {'hippocampus' } Q {'hippocampus' } R {'hippocampus' } S {'hippocampus' } T {'hippocampus' } U {'hypothalamus' } V {'hypothalamus' } W {'hypothalamus' } X {'hypothalamus' } Y {'hypothalamus' } Z {'hypothalamus' }
View a summary of the sample metadata.
summary(sData.variableValues)
Gender: [26x1 cell array of character vectors] Age: [26x1 double] min 1st quartile median 3rd quartile max 8 8 8 8 8 Type: [26x1 cell array of character vectors] Strain: [26x1 cell array of character vectors] Source: [26x1 cell array of character vectors]
The sampleNames
and variableNames
methods are convenient ways to access the names of samples and variables. Retrieve the variable names of the sData
object.
variableNames(sData)
ans = 1x5 cell array {'Gender'} {'Age'} {'Type'} {'Strain'} {'Source'}
You can retrieve the meta information about the variables describing the samples using the variableDesc
method. In this example, it contains only the descriptions about the variables.
variableDesc(sData)
ans = VariableDescription Gender {' Gender of the mouse in study' } Age {' The number of weeks since mouse birth'} Type {' Genetic characters' } Strain {' The mouse strain' } Source {' The tissue source for RNA collection' }
You can subset the sample data sData
object using numerical indexing.
sData(3:6, :)
ans = Sample Names: C, D, ...,F (4 total) Variable Names and Meta Information: VariableDescription Gender {' Gender of the mouse in study' } Age {' The number of weeks since mouse birth'} Type {' Genetic characters' } Strain {' The mouse strain' } Source {' The tissue source for RNA collection' }
You can display the mouse strain of specific samples by using numerical indexing.
sData.Strain([2 14])
ans = 2x1 cell array {'129S6/SvEvTac'} {'C57BL/6J' }
Note that the row names in sData
and the column names in exprsData
are the same. It is an important relationship between the expression data and the sample data in the same experiment.
all(ismember(sampleNames(sData), colnames(exprsData)))
ans = logical 1
Feature Annotation Metadata
The metadata about the features or probe set on an array can be very large and diverse. The chip manufacturers usually provide a specific annotation file for the features of each type of array. The metadata can be stored as a MetaData
object for a specific experiment. In this example, the annotation file for the MG-U74Av2 array can be downloaded from the Affymetrix web site. You will need to convert the file from CSV to XLSX format using a spreadsheet software application.
Read the entire file into MATLAB as a dataset
array. Alternatively, you can use the Range
option in the dataset
constructor. Any blank spaces in the variable names are removed to make them valid MATLAB variable names. A warning is displayed each time this happens.
mgU74Av2 = table2dataset(readtable('MG_U74Av2_annot.xlsx'));
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property. Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
Inspect the properties of this dataset
array.
get(mgU74Av2)
Description: '' VarDescription: {1x43 cell} Units: {} DimNames: {'Row' 'Variables'} UserData: [] ObsNames: {} VarNames: {1x43 cell}
Determine the number of probe set IDs in the annotation file.
numel(mgU74Av2.ProbeSetID)
ans = 12488
Retrieve the names of variables describing the features on the array and view the first 20 variable names.
fDataVariables = get(mgU74Av2, 'VarNames');
fDataVariables(1:20)'
ans = 20x1 cell array {'ProbeSetID' } {'GeneChipArray' } {'SpeciesScientificName' } {'AnnotationDate' } {'SequenceType' } {'SequenceSource' } {'TranscriptID_ArrayDesign_'} {'TargetDescription' } {'RepresentativePublicID' } {'ArchivalUniGeneCluster' } {'UniGeneID' } {'GenomeVersion' } {'Alignments' } {'GeneTitle' } {'GeneSymbol' } {'ChromosomalLocation' } {'UnigeneClusterType' } {'Ensembl' } {'EntrezGene' } {'SwissProt' }
Set the ObsNames
property to the probe set IDs, so that you can access individual gene annotations by indexing with probe set IDs.
mgU74Av2 = set(mgU74Av2,'ObsNames',mgU74Av2.ProbeSetID); mgU74Av2('100709_at',{'GeneSymbol','ChromosomalLocation'})
ans = GeneSymbol ChromosomalLocation 100709_at {'Tpbpa'} {'chr13 B2|13 36.0 cM'}
In some cases, it is useful to extract specific annotations that are relevant to the analysis. Extract annotations for GeneTitle
, GeneSymbol
, ChromosomalLocation
, and Pathway
relative to the features in exprsData
.
mgU74Av2 = mgU74Av2(:,{'GeneTitle',... 'GeneSymbol',... 'ChromosomalLocation',... 'Pathway'}); mgU74Av2 = mgU74Av2(rownames(exprsData),:); get(mgU74Av2)
Description: '' VarDescription: {1x4 cell} Units: {} DimNames: {'Row' 'Variables'} UserData: [] ObsNames: {500x1 cell} VarNames: {1x4 cell}
You can store the feature annotation dataset
array as an instance of the MetaData
class.
fData = bioma.data.MetaData(mgU74Av2)
fData = Sample Names: 100001_at, 100002_at, ...,100717_at (500 total) Variable Names and Meta Information: VariableDescription GeneTitle {'NA'} GeneSymbol {'NA'} ChromosomalLocation {'NA'} Pathway {'NA'}
Notice that there are no descriptions for the feature variables in the fData
MetaData
object. You can add descriptions about the variables in fData
using the variableDesc
method.
fData = variableDesc(fData, {'Gene title of a probe set',... 'Probe set gene symbol',... 'Probe set chromosomal locations',... 'The pathway the genes involved in'})
fData = Sample Names: 100001_at, 100002_at, ...,100717_at (500 total) Variable Names and Meta Information: VariableDescription GeneTitle {'Gene title of a probe set' } GeneSymbol {'Probe set gene symbol' } ChromosomalLocation {'Probe set chromosomal locations' } Pathway {'The pathway the genes involved in'}
Experiment Information
The MIAME
class is a flexible data container designed for a collection of basic descriptions about a microarray experiment, such as investigators, laboratories, and array designs. The MIAME
class loosely follows the Minimum Information About a Microarray Experiment (MIAME) specification [2].
Create a MIAME
object by providing some basic information.
expDesc = bioma.data.MIAME('investigator', 'Jane OneName',... 'lab', 'Bioinformatics Laboratory',... 'title', 'Example Gene Expression Experiment',... 'abstract', 'An example of using microarray objects.',... 'other', {'Notes: Created from a text files.'})
expDesc = Experiment Description: Author name: Jane OneName Laboratory: Bioinformatics Laboratory Contact information: URL: PubMedIDs: Abstract: A 5 word abstract is available. Use the Abstract property. No experiment design summary available. Other notes: {'Notes: Created from a text files.'}
Another way to create a MIAME
object is from GEO series data. The MIAME
class will populate the corresponding properties from the GEO series structure. The information associated with the gene profile experiment in this example is available from the GEO database under the accession number GSE3327
[1]. Retrieve the GEO Series data using the getgeodata
function.
getgeodata('GSE3327', 'ToFile', 'GSE3327.txt');
Read the data into a structure.
geoSeries = geoseriesread('GSE3327.txt')
geoSeries = struct with fields: Header: [1x1 struct] Data: [12488x87 bioma.data.DataMatrix]
Create a MIAME
object.
exptGSE3327 = bioma.data.MIAME(geoSeries)
exptGSE3327 = Experiment Description: Author name: Iiris,,Hovatta David,J,Lockhart Carrolee,,Barlow Laboratory: The Salk Institute for Biological Studies Contact information: Carrolee,,Barlow URL: PubMedIDs: 16244648 Abstract: A 14 word abstract is available. Use the Abstract property. Experiment Design: A 8 word summary is available. Use the ExptDesign property. Other notes: {'ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/supplementary/series/GSE3327/GSE3327_RAW.tar'}
View the abstract of the experiment and its PubMed IDs.
abstract = exptGSE3327.Abstract pubmedID = exptGSE3327.PubMedID
abstract = 'Adult mouse gene expression patterns in common strains Keywords: mouse strain and brain region comparison' pubmedID = '16244648'
Creating an ExpressionSet Object
The ExpressionSet
class is designed specifically for microarray gene expression experiment data. Assemble an ExpressionSet
object for the example mouse gene expression experiment from the different data objects you just created.
exptSet = bioma.ExpressionSet(exprsData, 'SData', sData,... 'FData', fData,... 'Einfo', exptGSE3327)
exptSet = ExpressionSet Experiment Data: 500 features, 26 samples Element names: Expressions Sample Data: Sample names: A, B, ...,Z (26 total) Sample variable names and meta information: Gender: Gender of the mouse in study Age: The number of weeks since mouse birth Type: Genetic characters Strain: The mouse strain Source: The tissue source for RNA collection Feature Data: Feature names: 100001_at, 100002_at, ...,100717_at (500 total) Feature variable names and meta information: GeneTitle: Gene title of a probe set GeneSymbol: Probe set gene symbol ChromosomalLocation: Probe set chromosomal locations Pathway: The pathway the genes involved in Experiment Information: use 'exptInfo(obj)'
You can also create an ExpressionSet
object with only the expression values in a DataMatrix
or a numeric matrix.
miniExprSet = bioma.ExpressionSet(exprsData)
miniExprSet = ExpressionSet Experiment Data: 500 features, 26 samples Element names: Expressions Sample Data: none Feature Data: none Experiment Information: none
Saving and Loading an ExpressionSet Object
The data objects for a microarray experiment can be saved as MAT files. Save the ExpressionSet
object exptSet
to a MAT file named mouseExpressionSet.mat
.
save mouseExpressionSet exptSet
Clear variables from the MATLAB Workspace.
clear dm exprs* mouseExptData ME sData
Load the MAT file mouseExpressionSet
into the MATLAB Workspace.
load mouseExpressionSet
Inspect the loaded ExpressionSet
object.
exptSet.elementNames
ans = 1x1 cell array {'Expressions'}
exptSet.NSamples
ans = 26
exptSet.NFeatures
ans = 500
Accessing Data Components of an ExpressionSet Object
A number of methods are available to access and update data stored in an ExpressionSet
object.
You can access the columns of the sample data using dot notation.
exptSet.Strain(1:5)
ans = 5x1 cell array {'129S6/SvEvTac'} {'129S6/SvEvTac'} {'129S6/SvEvTac'} {'A/J ' } {'A/J ' }
Retrieve the feature names using the featureNames
method. In this example, the feature names are the probe set identifiers on the array.
featureNames(exptSet, 1:5)
ans = 5x1 cell array {'100001_at'} {'100002_at'} {'100003_at'} {'100004_at'} {'100005_at'}
The unique identifier of the samples can be accessed via the sampleNames
method.
exptSet.sampleNames(1:5)
ans = 1x5 cell array {'A'} {'B'} {'C'} {'D'} {'E'}
The sampleVarNames
method lists the variable names in the sample data.
exptSet.sampleVarNames
ans = 1x5 cell array {'Gender'} {'Age'} {'Type'} {'Strain'} {'Source'}
Extract the dataset
array containing sample information.
sDataset = sampleVarValues(exptSet)
sDataset = Gender Age Type Strain A {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} B {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} C {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} D {'Male'} 8 {'Wild type'} {'A/J ' } E {'Male'} 8 {'Wild type'} {'A/J ' } F {'Male'} 8 {'Wild type'} {'C57BL/6J ' } G {'Male'} 8 {'Wild type'} {'C57BL/6J' } H {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} I {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} J {'Male'} 8 {'Wild type'} {'A/J' } K {'Male'} 8 {'Wild type'} {'A/J' } L {'Male'} 8 {'Wild type'} {'A/J' } M {'Male'} 8 {'Wild type'} {'C57BL/6J' } N {'Male'} 8 {'Wild type'} {'C57BL/6J' } O {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} P {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} Q {'Male'} 8 {'Wild type'} {'A/J' } R {'Male'} 8 {'Wild type'} {'A/J' } S {'Male'} 8 {'Wild type'} {'C57BL/6J' } T {'Male'} 8 {'Wild type'} {'C57BL/6J4' } U {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} V {'Male'} 8 {'Wild type'} {'129S6/SvEvTac'} W {'Male'} 8 {'Wild type'} {'A/J' } X {'Male'} 8 {'Wild type'} {'A/J' } Y {'Male'} 8 {'Wild type'} {'C57BL/6J' } Z {'Male'} 8 {'Wild type'} {'C57BL/6J' } Source A {'amygdala' } B {'amygdala' } C {'amygdala' } D {'amygdala' } E {'amygdala' } F {'amygdala' } G {'amygdala' } H {'cingulate cortex'} I {'cingulate cortex'} J {'cingulate cortex'} K {'cingulate cortex'} L {'cingulate cortex'} M {'cingulate cortex'} N {'cingulate cortex'} O {'hippocampus' } P {'hippocampus' } Q {'hippocampus' } R {'hippocampus' } S {'hippocampus' } T {'hippocampus' } U {'hypothalamus' } V {'hypothalamus' } W {'hypothalamus' } X {'hypothalamus' } Y {'hypothalamus' } Z {'hypothalamus' }
Retrieve the ExptData
object containing expression values. There may be more than one DataMatrix
object with identical dimensions in an ExptData
object. In an ExpressionSet
object, there is always an element DataMatrix
object named Expressions
containing the expression matrix.
exptDS = exptData(exptSet)
exptDS = Experiment Data: 500 features, 26 samples 1 elements Element names: Expressions
Extract only the expression DataMatrix
instance.
dMatrix = expressions(exptSet);
The returned expression DataMatrix
should be identical to the exprsData
DataMatrix
object that you created earlier.
get(dMatrix)
Name: 'mouseExprsData' RowNames: {500x1 cell} ColNames: {1x26 cell} NRows: 500 NCols: 26 NDims: 2 ElementClass: 'double'
Get PubMed IDs for the experiment stored in exptSet
.
exptSet.pubMedID
ans = '16244648'
Subsetting an ExpressionSet Object
You can subset an ExpressionSet
object so that you can focus on the samples and features of interest. The first indexing argument subsets the features and the second argument subsets the samples.
Create a new ExpressionSet
object consisting of the first five features and the samples named A
, B
, and C
.
mySet = exptSet(1:5, {'A', 'B', 'C'})
mySet = ExpressionSet Experiment Data: 5 features, 3 samples Element names: Expressions Sample Data: Sample names: A, B, C Sample variable names and meta information: Gender: Gender of the mouse in study Age: The number of weeks since mouse birth Type: Genetic characters Strain: The mouse strain Source: The tissue source for RNA collection Feature Data: Feature names: 100001_at, 100002_at, ...,100005_at (5 total) Feature variable names and meta information: GeneTitle: Gene title of a probe set GeneSymbol: Probe set gene symbol ChromosomalLocation: Probe set chromosomal locations Pathway: The pathway the genes involved in Experiment Information: use 'exptInfo(obj)'
size(mySet)
ans = 5 3
featureNames(mySet)
ans = 5x1 cell array {'100001_at'} {'100002_at'} {'100003_at'} {'100004_at'} {'100005_at'}
sampleNames(mySet)
ans = 1x3 cell array {'A'} {'B'} {'C'}
You can also create a subset consisting of only the samples from hippocampus tissues.
hippocampusSet = exptSet(:, nominal(exptSet.Source)== 'hippocampus')
hippocampusSet = ExpressionSet Experiment Data: 500 features, 6 samples Element names: Expressions Sample Data: Sample names: O, P, ...,T (6 total) Sample variable names and meta information: Gender: Gender of the mouse in study Age: The number of weeks since mouse birth Type: Genetic characters Strain: The mouse strain Source: The tissue source for RNA collection Feature Data: Feature names: 100001_at, 100002_at, ...,100717_at (500 total) Feature variable names and meta information: GeneTitle: Gene title of a probe set GeneSymbol: Probe set gene symbol ChromosomalLocation: Probe set chromosomal locations Pathway: The pathway the genes involved in Experiment Information: use 'exptInfo(obj)'
hippocampusSet.Source
ans = 6x1 cell array {'hippocampus'} {'hippocampus'} {'hippocampus'} {'hippocampus'} {'hippocampus'} {'hippocampus'}
hippocampusExprs = expressions(hippocampusSet);
get(hippocampusExprs)
Name: 'mouseExprsData' RowNames: {500x1 cell} ColNames: {'O' 'P' 'Q' 'R' 'S' 'T'} NRows: 500 NCols: 6 NDims: 2 ElementClass: 'double'
References
[1] Hovatta, I., et al., "Glyoxalase 1 and glutathione reductase 1 regulate anxiety in mice", Nature, 438(7068):662-6, 2005.
[2] Brazma, A., et al., "Minimum information about a microarray experiment (MIAME) - toward standards for microarray data", Nat. Genet. 29(4):365-371, 2001.