bioinfo.pipeline.block.SeqSplit

Bioinformatics pipeline block to split sequences into separate files

Since R2023a

Description

A SeqSplit block enables you to split sequences according to the provided barcodes and save the sequences in separate files.

Creation

Syntax

b = bioinfo.pipeline.block.SeqSplit

b = bioinfo.pipeline.block.SeqSplit(options)

b = bioinfo.pipeline.block.SeqSplit(Name=Value)

Description

example

b = bioinfo.pipeline.block.SeqSplit creates a SeqSplit block.

b = bioinfo.pipeline.block.SeqSplit(options) also specifies additional options.

b = bioinfo.pipeline.block.SeqSplit(Name=Value) specifies additional options as the property names and values of a SeqSplitOptions object. This object is set as the value of the Options property of the block.

Input Arguments

expand all

`options` — SeqSplit options
`bioinfo.pipeline.options.SeqSplitOptions`

SeqSplit options, specified as a SeqSplitOptions object. The default is a default SeqSplitOptions object.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Note

The following list of arguments is a partial list. For the complete list, refer to the properties of SeqSplitOptions object.

`MaxMismatches` — Maximum number of mismatches allowed during barcode matching
`0` (default) | nonnegative integer

Maximum number of mismatches allowed during barcode matching, specified as a nonnegative integer. The default is 0, that is, no mismatches are allowed.

`BarcodeFormat` — Type of barcode to match
`5` (default) | 3

Type of barcode to match, specified as 3 or 5. A value of 5 corresponds to the barcode located at the 5' end of each sequence, and 3 corresponds to the 3' end.

Example:

Properties

expand all

`ErrorHandler` — Function to handle errors from `run` method
function handle

Function to handle errors from the run method of the block, specified as a function handle. The handle specifies the function to call if the run method encounters an error within a pipeline. For the pipeline to continue after a block fails, ErrorHandler must return a structure that is compatible with the output ports of the block. The error handling function is called with the following two inputs:

Structure with these fields:

Field	Description
identifier	Identifier of the error that occurred
message	Text of the error message
index	Linear index indicating which block process failed in the parallel run. By default, the index is 1 because there is only one run per block. For details on how block inputs can be split across different dimensions for multiple run calls, see Bioinformatics Pipeline SplitDimension.

Input structure passed to the run method when it fails

Data Types: function_handle

`Inputs` — Input ports
structure

This property is read-only.

Input ports of the block, specified as a structure. The field names of the structure are the names of the block input ports, and the field values are bioinfo.pipeline.Input objects. These objects describe the input port behaviors. The input port names are the expected field names of the input structure that you pass to the block run method.

The SeqSplit block Inputs structure has the following fields:

FASTQFiles — Names of FASTQ-formatted files with sequence and quality information. This input is a required input that must be satisfied.
BarcodeFile — Name of barcode file with barcode information. This input is a required input that must be satisfied.

The default value for each input field is a bioinfo.pipeline.datatypes.Unset object, which means that the input value is not set yet.

Data Types: struct

`Outputs` — Output ports
structure

This property is read-only.

Output ports of the block, specified as a structure. The field names of the structure are the names of the block output ports, and the field values are bioinfo.pipeline.Output objects. These objects describe the output port behaviors. The field names of the output structure returned by the block run method are the same as the output port names.

The SeqSplit block Outputs structure has the following fields:

SplitFASTQFiles — Output file names. By default, the name of each output file consists of the input file name followed by the output suffix ('_split') and the barcode identifier.

Tip
To see the actual location of these files, first get the results of the block. Then use the unwrap method as shown in this example.
NumSplit — Numbers of sequences saved in each output file, returned as a scalar or an n-by-1 vector, where n is the number of output files. If there are multiple output files, the order within n corresponds to the order of the output files.

Data Types: struct

`Options` — SeqSplit options
`bioinfo.pipeline.options.SeqSplitOptions` object (default)

SeqSplit options, specified as a SeqSplitOptions object. The default value is a default SeqSplitOptions object.

Object Functions

`compile`	Perform block-specific additional checks and validations
`copy`	Copy array of handle objects
`emptyInputs`	Create input structure for use with `run` method
`eval`	Evaluate block object
`run`	Run block object

Examples

collapse all

Split Sequences Based on Barcodes

Use a SeqSplit block to split sequences into separate files based on barcodes.

import bioinfo.pipeline.block.*
import bioinfo.pipeline.Pipeline

%% 
% Create a tab-delimited file with barcode info.
barcodeInfo = {'ID1','AAAAC';'ID2', 'AGATT';'ID3', 'GACTT'};
writetable(cell2table(barcodeInfo), 'barcodeExample.txt', ...
           'Delimiter','\t','WriteVariableNames',false);
%%
%  Create and add blocks to a pipeline.
FC1 = FileChooser(which("SRR005164_1_50.fastq"));
FC2 = FileChooser(which("barcodeExample.txt"));
SS = SeqSplit;
P = Pipeline;
addBlock(P,[FC1,FC2,SS]);
connect(P,FC1,SS,["Files","FASTQFiles"]);
connect(P,FC2,SS,["Files","BarcodeFile"]);

%%
% Run the pipeline and get the results.
run(P);
R = results(P,SS)

R = 

  struct with fields:

    SplitFASTQFiles: [3×1 bioinfo.pipeline.datatypes.File]
           NumSplit: [3×1 double]

Call unwrap on SplitFASTQFiles to see the location of the generated files.

unwrap(R.SplitFASTQFiles)

ans = 

  3×1 string array

    "C:\PipelineResults\SeqSplit_1\1\SRR005164_1_50_split_ID1.fastq"
    "C:\PipelineResults\SeqSplit_1\1\SRR005164_1_50_split_ID2.fastq"
    "C:\PipelineResults\SeqSplit_1\1\SRR005164_1_50_split_ID3.fastq"

Version History

Introduced in R2023a

bioinfo.pipeline.block.SeqSplit

Description

Creation

Syntax

Description

Input Arguments

options — SeqSplit options bioinfo.pipeline.options.SeqSplitOptions

MaxMismatches — Maximum number of mismatches allowed during barcode matching 0 (default) | nonnegative integer

BarcodeFormat — Type of barcode to match 5 (default) | 3

Properties

ErrorHandler — Function to handle errors from run method function handle

Inputs — Input ports structure

Outputs — Output ports structure

Options — SeqSplit options bioinfo.pipeline.options.SeqSplitOptions object (default)

Object Functions

Examples

Split Sequences Based on Barcodes

Version History

See Also

`options` — SeqSplit options
`bioinfo.pipeline.options.SeqSplitOptions`

`MaxMismatches` — Maximum number of mismatches allowed during barcode matching
`0` (default) | nonnegative integer

`BarcodeFormat` — Type of barcode to match
`5` (default) | 3

`ErrorHandler` — Function to handle errors from `run` method
function handle

`Inputs` — Input ports
structure

`Outputs` — Output ports
structure

`Options` — SeqSplit options
`bioinfo.pipeline.options.SeqSplitOptions` object (default)