Main Content

affyrma

Perform Robust Multi-array Average (RMA) procedure on Affymetrix microarray probe-level data

Syntax

Expression = affyrma(CELFiles, CDFFile)
Expression = affyrma(ProbeStructure)
Expression = affyrma(CELFiles, CDFFile, ...'CELPath', CELPathValue, ...)
Expression = affyrma(CELFiles, CDFFile, ...'CDFPath', CDFPathValue, ...)
Expression = affyrma(..., 'Method', MethodValue, ...)
Expression = affyrma(..., 'Truncate', TruncateValue, ...)
Expression = affyrma(..., 'Median', MedianValue, ...)
Expression = affyrma(..., 'Output', OutputValue, ...)
Expression = affyrma(..., 'Showplot', ShowplotValue, ...)
Expression = affyrma(..., 'Verbose', VerboseValue, ...)

Input Arguments

CELFiles

Any of the following:

  • Character vector or string specifying a single CEL file name.

  • '*', which reads all CEL files in the current folder.

  • ' ', which opens the Select CEL Files dialog box from which you select the CEL files. From this dialog box, you can press and hold Ctrl or Shift while clicking to select multiple CEL files.

  • Cell array of character vectors or string vector containing CEL file names.

CDFFile

Either of the following:

  • Character vector or string specifying a CDF file name.

  • ' ', which opens the Select CDF File dialog box from which you select the CDF file.

ProbeStructure

MATLAB® structure containing information from the CEL files, including probe intensities, probe indices, and probe set IDs, returned by the celintensityread function.

CELPathValue

Character vector or string specifying the path and folder where the files specified in CELFiles are stored.

CDFPathValue

Character vector or string specifying the path and folder where the file specified in CDFFile is stored.

MethodValue

Specifies the estimation method for the background adjustment model parameters. Choices are 'RMA' (to use estimation method described by Bolstad, 2005) or 'MLE' (to estimate the parameters using maximum likelihood). Default is 'RMA'.

TruncateValue

Specifies the background noise model. Choices are true (use a truncated Gaussian distribution) or false (use a nontruncated Gaussian distribution). Default is true.

MedianValue

Specifies the use of the median of the ranked values instead of the mean for normalization. Choices are true or false (default).

OutputValue

Specifies the scale of the returned gene expression values. Choices are:

  • 'log'

  • 'log2'

  • 'log10'

  • 'linear'

  • @functionname

In the last instance, the data is transformed as defined by the function functionname. Default is 'log2'.

ShowplotValue

Controls the plotting of a histogram showing the distribution of PM probe intensity values (blue) and the convoluted probability distribution function (red), with estimated parameters mu, sigma and alpha. Enter either 'all' (plot a histogram for each column or chip) or specify a subset of columns (chips) by entering the column number, list of numbers, or range of numbers.

For example:

  • (..., 'Showplot', 3, ...) plots the intensity values in column 3.

  • (..., 'Showplot', [3,5,7], ...) plots the intensity values in columns 3, 5, and 7.

  • (..., 'Showplot', 3:9, ...) plots the intensity values in columns 3 to 9.

VerboseValue

Controls the display of the status of the reading of files and RMA processing. Choices are true (default) or false.

Output Arguments

Expression

DataMatrix object containing the log2 based gene expression values that have been background adjusted, normalized, and summarized using the Robust Multi-array Average (RMA) procedure.

Each row in Expression corresponds to a gene (probe set), and each column corresponds to an Affymetrix® CEL file.

Description

Expression = affyrma(CELFiles, CDFFile) reads the specified Affymetrix CEL files and the associated CDF library file (created from Affymetrix GeneChip® arrays for expression or genotyping assays), processes the probe intensity values using RMA background adjustment, quantile normalization, and summarization procedures, then returns Expression, a DataMatrix object containing the log2 based gene expression values in a matrix, the probe set IDs as row names, and the CEL file names as column names. Note that each row in Expression corresponds to a gene (probe set), and each column corresponds to an Affymetrix CEL file. (Each CEL file is generated from a separate chip. All chips should be of the same type.)

CELFiles is a character vector, string, string vector, or cell array of character vectors containing CEL file names. CDFFile is a character vector or string specifying a CDF file name. If you set CELFiles to '*', then it reads all CEL files in the current folder. If you set CELFiles to ' ', then it opens the Select CEL Files dialog box from which you select the CEL files.

Note

For details on the reading of files and RMA processing, see celintensityread, rmabackadj, quantilenorm, and rmasummary.

Expression = affyrma(ProbeStructure) uses RMA background adjustment, quantile normalization, and summarization procedures to process the probe intensity values in ProbeStructure, a MATLAB structure containing information from the CEL files, including probe intensities, probe indices, and probe set IDs, returned by the celintensityread function, and returns Expression.

Expression = affyrma(..., 'PropertyName', PropertyValue, ...) calls affyrma with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

Expression = affyrma(CELFiles, CDFFile, ...'CELPath', CELPathValue, ...) specifies a path and folder where the files specified by CELFiles are stored.

Expression = affyrma(CELFiles, CDFFile, ...'CDFPath', CDFPathValue, ...) specifies a path and folder where the file specified by CDFFile is stored.

Expression = affyrma(..., 'Method', MethodValue, ...) specifies the estimation method for the background adjustment model parameters. When MethodValue is 'RMA', affyrma implements the estimation method described by Bolstad, 2005. When MethodValue is 'MLE', affyrma estimates the parameters using maximum likelihood. Default is 'RMA'.

Expression = affyrma(..., 'Truncate', TruncateValue, ...) specifies the background noise model used. When TruncateValue is false, affyrma uses nontruncated Gaussian as the background noise model. Default is true.

Expression = affyrma(..., 'Median', MedianValue, ...) specifies the use of the median of the ranked values instead of the mean for normalization. Choices are true or false (default).

Expression = affyrma(..., 'Output', OutputValue, ...) specifies the scale of the returned gene expression values. OutputValue can be:

  • 'log'

  • 'log2'

  • 'log10'

  • 'linear'

  • @functionname

In the last instance, the data is transformed as defined by the function functionname. Default is 'log2'.

Expression = affyrma(..., 'Showplot', ShowplotValue, ...) lets you plot a histogram showing the distribution of PM probe intensity values (blue) and the convoluted probability distribution function (red), with estimated parameters mu, sigma and alpha. When ShowplotValue is 'all', rmabackadj plots a histogram for each column or chip. When ShowplotValue is a number, list of numbers, or range of numbers, rmabackadj plots a histogram for the indicated column number (chip).

For example:

  • (..., 'Showplot', 3,...) plots the intensity values in column 3.

  • (..., 'Showplot', [3,5,7],...) plots the intensity values in columns 3, 5, and 7.

  • (..., 'Showplot', 3:9,...) plots the intensity values in columns 3 to 9.

Expression = affyrma(..., 'Verbose', VerboseValue, ...) controls the display of the status of the reading of files and RMA processing. Choices are true (default) or false.

Examples

The following example assumes that you have the HG_U95Av2.CDF library file stored at D:\Affymetrix\LibFiles\HGGenome, and that your current folder points to a location containing CEL files associated with this CDF library file. In this example, the affyrma function reads all the CEL files in the current folder and a CDF file in a specified folder. It also performs RMA background adjustment, quantile normalization, and summarization procedures on the PM probe intensity values, and returns a DataMatrix object, containing the metadata and processed data.

Expression = affyrma('*', 'HG_U95Av2.CDF',...
	                    'CDFPath', 'D:\Affymetrix\LibFiles\HGGenome');

References

[1] Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., Speed, T.P. (2003). Exploration, Normalization, and Summaries of High Density Oligonucleotide Array Probe Level Data. Biostatistics. 4, 249–264.

[2] Mosteller, F., and Tukey, J. (1977). Data Analysis and Regression (Reading, Massachusetts: Addison-Wesley Publishing Company), pp. 165–202.

[3] Best, C.J.M., Gillespie, J.W., Yi, Y., Chandramouli, G.V.R., Perlmutter, M.A., Gathright, Y., Erickson, H.S., Georgevich, L., Tangrea, M.A., Duray, P.H., Gonzalez, S., Velasco, A., Linehan, W.M., Matusik, R.J., Price, D.K., Figg, W.D., Emmert-Buck, M.R., and Chuaqui, R.F. (2005). Molecular alterations in primary prostate cancer after androgen ablation therapy. Clinical Cancer Research 11, 6823–6834.

[4] Bolstad, B. (2005). “affy: Built-in Processing Methods” https://www.bioconductor.org/packages/2.1/bioc/vignettes/affy/ inst/doc/builtinMethods.pdf

Version History

Introduced in R2008b