gpucoder.reduce

Optimize GPU implementation for reduction operations

Syntax

S = gpucoder.reduce(A,FUN)

S = gpucoder.reduce(A,{@FUN1,@FUN2,...})

S = gpucoder.reduce(___,Name=Value)

Description

S = gpucoder.reduce(A,FUN) aggregates the values in the input array A to a single value by using the function handle FUN. The output S is a scalar.

S = gpucoder.reduce(A,{@FUN1,@FUN2,...}) aggregates the values in the input array to a single value using every function handle provided in the cell array. The size of output is 1-by-N, where N is the number of function handles.

The code generator uses shuffle intrinsics to perform reduction operations on the GPU. The function aggregates multiple function handles inside a single kernel on the GPU.

S = gpucoder.reduce(___,Name=Value) aggregates the values in the input array using the options specified by one or more name-value arguments.

example

Examples

collapse all

Find Minimum and Sum of Array

This example generates CUDA^® code that finds the minimum of an array and the sum of its elements greater than a specified threshold.

Write an entry-point function named multireduce that accepts the matrix input A, dimension dim, and a threshold value threshold. Use the gpucoder.reduce function to perform two types of reduction operations on the elements of A along the dimension dim.

function [s1, s2] = multireduce(A, dim, threshold) %codegen
    fh1 = @min;
    fh2 = @(a,b) a*(a > threshold) + b*(b > threshold);
    [s1, s2] = gpucoder.reduce(A, {fh1, fh2}, "dim", dim);
end

Specify the input arguments as a 32-by-32 array for the input matrix, a constant value of 2 for the dimension, and a scalar value for the threshold. Run the codegen command to generate the CUDA MEX function multireduce_mex.

inputArgs = {rand(32, 32, "double"), coder.Constant(2), 0.5};
cfg = coder.gpuConfig('mex');
codegen -config cfg -args inputArgs multireduce -report

Call multireduce_mex with a constant input value of 2 for the second argument and an input value of 0.5 for the third argument.

[s1, s2] = multireduce_mex(rand(32), 2, 0.5);

Input Arguments

collapse all

`A` — Input array
vector | matrix | array

Input array, specified as a vector, matrix, or array. For code generation, the input array must be of numeric or logical data type.

`FUN` — User-defined function
function handle

User-defined function, specified as a named or anonymous function handle. The function handle is a binary function and must:

Accept two inputs and return one output. The type of the inputs and output to the function must match the type of the input array A.
Be commutative and associative.

If FUN is anonymous, it can refer to variables that exist in the scope where you define the function. You can use these variables in the reduction function in addition to the two input arguments to FUN.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: gpucoder.reduce(A, {@min, @max}, dim=2);

`dim` — Reduction dimension
positive integer scalar

Reduction dimension, specified as a positive integer scalar.

Example: gpucoder.reduce(A, {@min, @max}, dim=2);

`preprocess` — Preprocessing function
function handle

Preprocessing function, specified as a named or anonymous function handle. By default, gpucoder.reduce does not preprocess the input array.

If the preprocess function handle is anonymous, you can refer to variables that exist in the scope where you define the function. You can create preprocessing functions that refer to these variables as well as the input array.

Example: gpucoder.reduce(A,@min, preprocess=@myScale);

Output Arguments

collapse all

`S` — Result of reduction operation
scalar | vector | matrix

Result of the reduction operation, returned as a scalar, vector, or matrix. During reduction, the function initializes S to the value of one of the elements of the input array A. Then, S takes the actions in the table.

Shape of `A`	Number of Function Handles	Input Argument `dim`	Output `S`
Vector	1	Unspecified	`S` is a scalar.
Vector	`N`	Unspecified	`S` is a 1-by-N vector.
Matrix	1	Specified	The function applies the reduction operation `FUN` to `A` along the dimension `dim`. For example, if `size(A) = [8 16 32]` and `dim = 2`, then `size(S) = [8 1 32]`.
Matrix	`N`	Specified	The function applies each function handle to `A` along the dimension `dim`. For example, if `size(A) = [8 16 32]`, `dim = 2`, and number of function handles is `N`, then `[s1, s2, … sN] = gpucoder.reduce(___)` and `size(s1) = [8 1 32]`.

Limitations

gpucoder.reduce does not support reducing complex arrays.
The user-defined function must accept two inputs and return one output. The data types of the inputs and output must match the data type of the preprocessed input array.
The user-defined function must be commutative and associative. Otherwise, the behavior of the function is undefined.
For code generation, gpucoder.reduce accepts a limited number of user-defined function handles based on the size of the output data type. For example, you can input up to 46 function handles that output the half data type or up to 11 function handles that output the double data type. If you input too many function handles, code generation generates an error.
For inputs that are of the integer data type, the generated code may contain intermediate computations that reach saturation. In this case, the results from the generated code may not match the simulation results from MATLAB^®.

Version History

Introduced in R2019b

expand all

R2024b: Use `half` data type for input arrays and anonymous function handles

You can use input arrays that have the half data type. You can also use anonymous function handles for the reduction and preprocessing functions.

R2024b: Improved performance when specifying the reduction dimension

The code generated for gpucoder.reduce has improved performance when you specify the dimension name-value argument dim.

gpucoder.reduce

Syntax

Description

Examples

Find Minimum and Sum of Array

Input Arguments

`A` — Input array
vector | matrix | array

`FUN` — User-defined function
function handle

Name-Value Arguments

`dim` — Reduction dimension
positive integer scalar

`preprocess` — Preprocessing function
function handle

Output Arguments

`S` — Result of reduction operation
scalar | vector | matrix

Limitations

Version History

R2024b: Use `half` data type for input arrays and anonymous function handles

R2024b: Improved performance when specifying the reduction dimension

See Also

Apps

Functions

Objects

Topics

gpucoder.reduce

Syntax

Description

Examples

Find Minimum and Sum of Array

Input Arguments

A — Input array vector | matrix | array

FUN — User-defined function function handle

Name-Value Arguments

dim — Reduction dimension positive integer scalar

preprocess — Preprocessing function function handle

Output Arguments

S — Result of reduction operation scalar | vector | matrix

Limitations

Version History

R2024b: Use half data type for input arrays and anonymous function handles

R2024b: Improved performance when specifying the reduction dimension

See Also

Apps

Functions

Objects

Topics

`A` — Input array
vector | matrix | array

`FUN` — User-defined function
function handle

`dim` — Reduction dimension
positive integer scalar

`preprocess` — Preprocessing function
function handle

`S` — Result of reduction operation
scalar | vector | matrix

R2024b: Use `half` data type for input arrays and anonymous function handles