bioinfo.pipeline.block.SRASAMDump
Description
An SRASAMDump block enables you to download SAM files from SRA
(Sequence Read Archive) [1].
bioinfo.pipeline.block.SRASAMDump requires the SRA Toolkit for Bioinformatics Toolbox™. If this support package is not installed, then the function provides a download
link. For details, see Bioinformatics Toolbox Software Support Packages.
Creation
Syntax
Description
creates an b = bioinfo.pipeline.block.SRASAMDumpSRASAMDump block.
uses additional options specified by b = bioinfo.pipeline.block.SRASAMDump(options)options.
specifies additional options using one or more name-value arguments. For example, you can
specify to retrieve the FASTA-formatted file using the b = bioinfo.pipeline.block.SRASAMDump(Name=Value)FastaOutput
name-value argument. The name-value arguments sets the property names and values of an
SRASAMDumpOptions object. These property values are assigned to the
Options property of the block.
Input Arguments
SRASAMDump options, specified as an SRASAMDumpOptions object, string scalar, or character vector.
If you are specifying a string scalar or character vector, it must be in the
sam-dump original syntax (prefixed by a dash).
Data Types: char | string
Name-Value Arguments
Specify optional pairs of arguments as
Name1=Value1,...,NameN=ValueN, where Name is
the argument name and Value is the corresponding value.
Name-value arguments must appear after other arguments, but the order of the
pairs does not matter.
Example: b = bioinfo.pipeline.block.SRASAMDump(BZip2=true) specifies
to compress the downloaded file using bzip2.
Flag to compress the output files using bzip2, specified as a numeric or logical 1
(true) or 0 (false).
Data Types: double | logical
Additional commands, specified as a character vector or string scalar.
The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.
Example: ExtraCommand="--aligned-region
chr20:2500000-2600000"
Data Types: char | string
Flag to produce FASTA-formatted output files, specified as a numeric or logical 1 (true) or 0 (false).
Data Types: double | logical
Flag to produce FASTQ-formatted output files, specified as a numeric or logical 1 (true) or 0 (false).
Data Types: double | logical
Flag to compress the output files using gzip, specified as a numeric or logical 1
(true) or 0 (false).
Data Types: double | logical
Flag to use '=' in the output if a base is identical to the
reference, specified as a numeric or logical 1 (true) or 0
(false).
Data Types: double | logical
Flag to include all object properties with
corresponding default values when converting properties to the original option syntax,
specified as a numeric or logical 1 (true) or 0
(false). You can convert properties to the original syntax
prefixed by one or two dashes (such as '--aligned-region
chr20:2500000-2600000') by using the getCommand function.
When IncludeAll=false and you call
getCommand(optionsObject), the software converts only the
specified properties. If the value is true,
getCommand converts all available properties, using default
values for unspecified properties, to the original syntax.
Note
If you set IncludeAll to true, the
software translates all available properties, with default values for
unspecified properties. The only exception is that when the default value of a
property is NaN, Inf,
[], '', or "", then
the software does not translate the corresponding property.
Data Types: logical
Minimum mapping quality required for an alignment to be included in the output, specified as a nonnegative scalar.
Data Types: double
Output filename, specified as a character vector or string scalar.
Data Types: char | string
Flag to output primary alignments only, specified as a numeric or logical 1 (true) or 0 (false).
Data Types: double | logical
Flag to output the unaligned reads with the aligned reads, specified as a numeric or
logical 1 (true) or 0 (false).
Data Types: double | logical
Properties
Function to handle errors from the run
method of the block, specified as a function handle. The handle specifies the function to call
if the run method encounters an error within a pipeline. For the pipeline to continue after a
block fails, ErrorHandler must return a structure that is compatible with
the output ports of the block. The error handling function is called with the following two inputs:
Structure with these fields:
Field Description identifier Identifier of the error that occurred message Text of the error message index Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension. Input structure passed to the
runmethod when it fails
Data Types: function_handle
This property is read-only.
Input ports of the block, specified as a structure. The field
names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors.
The input port names are the expected field names of the input structure that you pass to the
block run method.
The SRASAMDump block Inputs structure has the
following field:
SRRID— Accession numbers. This input is a required input that must be satisfied.
Data Types: struct
This property is read-only.
Output ports of the block, specified as a structure. The field
names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors.
The field names of the output structure returned by the block run method
are the same as the output port names.
The SRASAMDump block Outputs structure has a
field named OutputFiles.
Data Types: struct
SRASAMDump options, specified as an SRASAMDumpOptions object. The default value is a default
SRASAMDumpOptions object.
Object Functions
compile | Perform block-specific additional checks and validations |
copy | Copy array of handle objects |
emptyInputs | Create input structure for use with run method |
eval | Evaluate block object |
run | Run block object |
Examples
Import the pipeline and block objects needed for the example so that you can create these objects without specifying the entire namespace.
import bioinfo.pipeline.Pipeline import bioinfo.pipeline.block.*
Create a pipeline.
P = Pipeline;
Create an SRAFasterqDump block and specify the accession number SRR11846824 as the block input. SRR11846824 has two reads per spot and no unaligned reads.
SRAFQDump = SRAFasterqDump;
SRAFQDump.Inputs.SRRID.Value = "SRR11846824";
addBlock(P,SRAFQDump);Run the pipeline to download the corresponding FASTQ files from SRA for the specified accession number.
run(P);
Get the results of the SRAFQDump block.
R = results(P,SRAFQDump)
R = struct with fields:
Reads: [1×1 bioinfo.pipeline.datatype.Incomplete]
Reads_1: [1×1 bioinfo.pipeline.datatype.File]
Reads_2: [1×1 bioinfo.pipeline.datatype.File]
Reads_3: [1×1 bioinfo.pipeline.datatype.Incomplete]
Reads_4: [1×1 bioinfo.pipeline.datatype.Incomplete]
Reads_5: [1×1 bioinfo.pipeline.datatype.Incomplete]
View the names of the downloaded files by using the unwrap function.
unwrap(R.Reads_1) unwrap(R.Reads_2)
By default, the block uses the SplitType="SplitThree" option and downloads only biological reads. Specifically, the block splits spots into reads. For spots with two reads, the block produces *_1.fastq and *_2.fastq and displays them in the Reads_1 and Reads_2 fields, respectively. The block saves any unaligned reads in a *.fastq file and displays it in the Reads field. Because this accession has no unaligned reads, the block did not produce a *.fastq file, and the Reads field is returned as Incomplete. Reads_3, Reads_4, and Reads_5 are also Incomplete because of the usage of SplitType="SplitThree". For more details on the block output behavior, see Outputs.
You can specify other download options using the SRAFasterqDumpOptions. For instance, to download the FASTA-formatted file, specify FastaOutput=true and rerun the block.
opt = SRAFasterqDumpOptions; opt.FastaOutput = true; SRAFQDump.Options = opt;
You can also download SAM files from SRA using the SRASAMDump block.
SRASDump = SRASAMDump;
Specify the accession number to download.
SRASDump.Inputs.SRRID.Value = "SRR11846824";Specify the options using an SRASAMDumpOptions object. For instance, set the output filename and compress the output file using bzip2.
samdumpopt = SRASAMDumpOptions;
samdumpopt.BZip2 = 1;
samdumpopt.OutputFileName = "SRR11846824.sam.bz2"samdumpopt =
SRASAMDumpOptions with properties:
Default properties:
ExtraCommand: ""
FastaOutput: 0
FastqOutput: 0
GZip: 0
HideIdentical: 0
IncludeAll: 0
MinMapQuality: 0
OutputPrimary: 0
OutputUnaligned: 0
Version: "3.0.6"
Modified properties:
BZip2: 1
OutputFileName: "SRR11846824.sam.bz2"
SRASDump.Options = samdumpopt;
Add the block to the pipeline and run the pipeline.
addBlock(P,SRASDump); run(P);
Get the block results.
R2 = results(P,SRASDump);
View the names of the output files by using the unwrap function.
unwrap(R2.OutputFiles)
After downloading the files, you can use them for downstream analyses. For instance, you can run bowtie2 to map the reads to the reference sequence, and then visualize the mapped reads in the Genomics Viewer app.
First, download the C. elegans reference sequence.
celegans_refseq = fastaread("https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/ce11/ce11.fa");Save the Chromosome 3 reference data in a FASTA file.
celegans_chr3 = celegans_refseq(3).Sequence;
fastawrite("celegans_chr3.fa",celegans_chr3);Create a FileChooser block to select the Chromosome 3 reference file.
fcRef = FileChooser;
fcRef.Files = fullfile(pwd,"celegans_chr3.fa");
addBlock(P,fcRef);Build a set of index files using the Bowtie2Build block. Set the base name of the index files and the name of the reference FASTA file.
buildIndex = Bowtie2Build; buildIndex.Inputs.IndexBaseName.Value = "celegans_chr3_index"; addBlock(P,buildIndex); connect(P,fcRef,buildIndex,["Files","ReferenceFASTAFiles"]); run(P);
Align reads to the reference using the Bowtie2 block. Create the block and then connect it to buildIndex and SRAFQDump blocks.
alignReads = Bowtie2; alignReads.OutFilename = "SRR11846824_mapped.sam"; addBlock(P,alignReads); connect(P,buildIndex,alignReads,["IndexBaseName","IndexBaseName"]); connect(P,SRAFQDump,alignReads,["Reads_1","Reads1Files";"Reads_2","Reads2Files"]); run(P);
Bowtie2 produces a SAM file. To visualize the mapped reads in the Genomics Viewer app, convert the SAM file to a BAM file.
First, make a UserFunction block to create a BioMap object from the SAM file.
biomapObj = UserFunction; biomapObj.Function = "BioMap"; biomapObj.RequiredArguments = "inputSAM"; biomapObj.OutputArguments = "biomapObject"; addBlock(P,biomapObj);
Next, connect the biomapObj block to the alignReads block, which provides the SAM file needed. Suppress two informational warnings issued during the creation of a BioMap object.
connect(P,alignReads,biomapObj,["SAMFile","inputSAM"]); w = warning; warning("off","bioinfo:BioMap:BioMap:UnsortedReadsInSAMFile"); warning("off","bioinfo:saminfo:InvalidTagField"); run(P); warning(w); % Restore warnings
Use the write method of the BioMap object to convert the SAM file to a BAM file.
sam2bam = UserFunction; sam2bam.Function = "write"; sam2bam.RequiredArguments = ["biomapObj","BAMFileName"]; sam2bam.NameValueArguments = "Format"; sam2bam.Inputs.BAMFileName.Value = "../../../SRR11846824_mapped.bam"; sam2bam.Inputs.Format.Value = "BAM"; addBlock(P,sam2bam); connect(P,biomapObj,sam2bam,["biomapObject","biomapObj"]); run(P);
Create a FileChooser block to select the generated BAM file.
fcBAM = FileChooser;
fcBAM.Files = fullfile(pwd,"SRR11846824_mapped.bam");
addBlock(P,fcBAM);Create a FileChooser block to select the C. elegans cytoband file, which is provided with the toolbox.
fcCyto = FileChooser;
fcCyto.Files = fullfile(pwd,"celegans_cytoBandIdeo.txt.gz");
addBlock(P,fcCyto);View the alignment data using the Genomics Viewer app.
gv = GenomicsViewer; addBlock(P,gv); connect(P,fcRef,gv,["Files","Reference"]); connect(P,fcCyto,gv,["Files","Cytoband"]); connect(P,fcBAM,gv,["Files","Tracks"]); run(P);
Use the zoom slider to zoom in and see the features. Or you can enter the following in the search text box: Generated:3,711,861-3,711,940.

Delete the pipeline results and downloaded files.
deleteResults(P,IncludeFiles=true);
References
[1] SRA Toolkit Development Team https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit
Version History
Introduced in R2024a
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)