Reduction Operations Supported for Automatic Parallelization of for-loops

Reduction Operations Supported for Automatic Parallelization of `for`-loops

The code generator automatically parallelizes for-loops by converting implicit and explicit sequential for-loop code blocks into parallelized code blocks. Parallelization of a section of code might significantly improve the execution speed of the generated code. See How parfor-Loops Improve Execution Speed.

Parallelize `for`-loops Performing Reduction Operations

You can parallelize for-loops performing reduction operations by using the configuration option Optimize reductions.

To enable automatic parallelization of these for-loops:

Open the MATLAB^® Coder™ app.
On the Generate Code page, click More Settings.
On the Speed tab, select the Enable automatic parallelization and Optimize reductions check boxes.

Optimize reductions is also enabled if you set the Leverage target hardware instruction set extensions parameter to an instruction set that your processor supports.

To enable the configuration option OptimizeReductions by using the command-line interface, run these commands.

cfg = coder.config('lib');
cfg.EnableAutoParallelization = true;
cfg.OptimizeReductions = true;

For example, write a MATLAB function arraySum that calculates the sum of elements of arrays in1 and sum, and returns the reduction variable out.

function out = arraySum(in1,a,b)
sum = 0;
c = zeros(numel(in1),1);
for i2 = 1:numel(in1)
    if i2 > in1(i2)
        sum = sum + in1(i2);
        c(i2) = a(i2) + b(i2);
    end
end
out = sum + mean(c);
end

At the MATLAB command line, run this codegen command.

arr = 1:1000;
codegen arraySum -config cfg -args {arr,arr,arr} -report

Code generation successful: View report

Open the code generation report to see the parallelized for-loop that performs the addition operation.

sum = 0.0;
#pragma omp parallel num_threads(omp_get_max_threads()) private(sumPrime, d)
  {
    sumPrime = 0.0;
    #pragma omp for nowait
    for (i2 = 0; i2 < 1000; i2++) {
      c[i2] = 0.0;
      d = in1[i2];
      if ((double)i2 + 1.0 > d) {
        sumPrime += d;
        c[i2] = a[i2] + b[i2];
      }
    }
    omp_set_nest_lock(&autoparExample_nestLockGlobal);
    {

      sum += sumPrime;
    }
    omp_unset_nest_lock(&autoparExample_nestLockGlobal);
  }

MATLAB Functions Supported for Reduction Operations

A reduction operation reduces specific dimensions of an input to a scalar value. A reduction operation must be associative and commutative. This table lists the MATLAB functions that are supported as reduction operations and are parallelized in generated code, where X is the reduction variable and expr is a MATLAB expression. The reduction variable X can appear on both sides of an assignment statement.

MATLAB Function	Usage Notes
`plus`	For integer data types, the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property must be disabled. Example: `X = X + expr`
`minus`	For integer data types, the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property must be disabled. Example: `X = X - expr`
`times`	For integer data types, the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property must be disabled. Example: `X = X .* expr`
`max`	Example: `X = max(X,expr)`
`min`	Example: `X = min(X,expr)`
`sum`	For integer data types, the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property must be disabled. Example: `X = sum(X)`
`prod`	For integer data types, the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property must be disabled. Example: `X = prod(X)`
`or`	Example: `X = X \| expr`
`and`	Example: `X = X & expr`
`bitand`	Example: `X = bitand(X,expr)`
`bitor`	Example: `X = bitor(X,expr)`
`bitxor`	Example: `X = bitxor(X,expr)`

Note

The Support nonfinite numbers (SupportNonFinite) property supports code generation only for standalone libraries (lib, dll) and executables.

The following example shows a typical usage of a reduction variable X.

X = 0;            % Initialize X
for i = 1:n
    X = X + d(i);
end

This loop is equivalent to the following, where you calculate each d(i) in a different iteration.

X = X + d(1) + ... + d(n)

Handling Overflow in Automatic Parallelization of `for`-loops

Enabling automatic parallelization of for-loops and reduction optimization might produce different results due to overflow when you compare the output of sequential MATLAB code with that of the generated parallel C/C++ code. Therefore, when there is possibility of such overflow, the code generator does not parallelize the loop.

The table shows the MATLAB functions where significant overflow can occur, along with their corresponding workarounds.

MATLAB Function Description Workaround

MATLAB Function	Description	Workaround
Integer overflow function out = integerOverflow(in) out = int8(0); for i = 1:numel(in) out = out + in(i); end end integerOverflow(int8(1:100)) ans = int8 127	Automatic parallelization of reduction based for-loops performing arithmetic operations on integers is not supported when `SaturateOnIntegerOverflow` parameter is enabled. During parallel execution, the reduction operations are distributed among multiple threads. When the partial results are accumulated at the end, the results might be non-deterministic. Therefore, the code generator do not automatically parallelize the `for`-loop. For example, (126-125) + 122 = 1 + 122 = 123 (126 + 122) - 125 = 127(saturation) - 125 = 2	If appropriate for your application, disable the Saturate on integer overflow (`SaturateOnIntegerOverflow`) property to automatically parallelize for-loops.

Integer overflow

function out = integerOverflow(in)
    out = int8(0);
    for i = 1:numel(in)
        out = out + in(i);
    end
end

integerOverflow(int8(1:100))

ans =

  int8

   127

Automatic parallelization of reduction based for-loops performing arithmetic operations on integers is not supported when SaturateOnIntegerOverflow parameter is enabled.

During parallel execution, the reduction operations are distributed among multiple threads. When the partial results are accumulated at the end, the results might be non-deterministic. Therefore, the code generator do not automatically parallelize the for-loop. For example,

(126-125) + 122 = 1 + 122 = 123

(126 + 122) - 125 = 127(saturation) - 125 = 2

If appropriate for your application, disable the Saturate on integer overflow (SaturateOnIntegerOverflow) property to automatically parallelize for-loops.

Usage Notes and Limitations

for-loops containing calls to C/C++ functions using coder.ceval are not automatically parallelized.
Bitwise reduction operations (bitand, bitor, and bitxor) are only supported for integer data types.
Custom reduction operations such as a = foo(a,b) are not supported for automatic parallelization of for-loops.
Reduction operations on floating-point numbers are only approximately associative. To get deterministic behavior of a parallel execution, the reduction operations involved must be associative. To be associative, a function f must satisfy the following for all a, b, and c.
```
f(a,f(b,c)) = f(f(a,b),c)
```
When working with floating-point numbers, different parallel executions of a loop might produce results with different round-off errors. If such round-off errors are unacceptable to your application, use the pragma coder.loop.parallelize('never') to instruct the code generator to not automatically parallelize specific for-loops. For more information on potential differences during code generation, see Differences Between Generated Code and MATLAB Code.