Main Content

Enhance Generated Code Performance Using Halide from a MATLAB Function Block

Using Halide programming for algorithms involving large multidimensional arrays can significantly increase the execution speed of the generated code. By integrating Halide with the C and C++ languages, the code generator can produce highly efficient code for high-performance array processing through different optimization techniques. With Embedded Coder®, you can decide whether to include Halide code into your existing C/C++ projects by comparing the performance with and without its inclusion. For more information, see Halide and Speed Up Generated Code Execution with Halide Code.

Generate Halide Code from a MATLAB Function Block

This example uses the model HalideMLFB with the input signals of size 100 and type single.

HalideMLFB uses a MATLAB Function block that has arithmetic as well as matrix multiplication operations.

     function C = fcn(A, B)
         coder.inline('never');
         tmp1 = A * B;
         tmp2 = tmp1 * 10 - 100;
         tmp3 = tmp2 + B / 3 + 30;
         C = tmp3 * B;
     end

To generate Halide code, open the Configuration Parameters dialog box, select the Generate Halide code parameter, and build the model.

The generated Halide code for this model is:

     void generate() {
          RDom r(0, 100);
          matmul_out1(d1, d2) = sum(A(d1, r) * B(r, d2));
          A1(d1, d2) = matmul_out1(d1, d2)*10.000000f-100.000000f + B(d1, d2)/3.000000f+30.000000f;
          RDom r1(0, 100);
          matmul_out2(d1, d2) = sum(A1(d1, r1) * B(r1, d2));
          matmul_out2_fcn(d1, d2) = matmul_out2(d1, d2);
      }

Instead of creating separate Halide pipelines for each operation in the MATLAB code, the code generator combines everything into the same Halide pipeline. Using the fusion technique of combining the matrix multiplication and arithmetic operations within the same Halide pipeline allows autoschedulers to employ Halide interstage scheduling primitives such as compute_at() and store_at(), which provides notable performance improvement.

NOTE: For the MATLAB Function block, Halide code generation is supported only when you include coder.inline('never') in your MATLAB code.

Compare Code Execution Times

You can run a software-in-the-loop (SIL) simulation to calculate the execution times of the generated Halide code and the plain C/C++ code for the HalideMLFB model. This will help you to decide whether to choose Halide over plain C/C++ code.

  1. Configure the model to generate plain C++ code.

    model = "HalideMLFB";
    load_system(model);
    
    set_param(bdroot,"HalideCodeGeneration",0);

  2. Configure the model to generate a workspace variable to save execution time measurements.

    set_param(model,"CodeExecutionProfiling","on");
    set_param(model,"CodeProfilingInstrumentation","off");
    set_param(model,"CodeProfilingSaveOptions","AllData");

  3. Run the SIL model simulation.

    out_sil1 = sim(model,"SimulationMode","software-in-the-loop (SIL)");

  4. Use the method Sections to extract the code execution time.

    nonhalideSection = out_sil1.executionProfile.Sections(2);
    nonhalideaverageTime = double(nonhalideSection.TotalExecutionTimeInTicks)/double(nonhalideSection.NumCalls);
    

  5. Configure the model to generate Halide code and run the simulation again.

    set_param(bdroot,"HalideCodeGeneration",1);
    
    out_sil2 = sim(model,"SimulationMode","software-in-the-loop (SIL)");
    
    halideSection = out_sil2.executionProfile.Sections(2);
    halideaverageTime = double(halideSection.TotalExecutionTimeInTicks)/double(halideSection.NumCalls);
    

  6. Compare the difference in code execution speed.

    speedup = nonhalideaverageTime/halideaverageTime;
    
    fprintf("Speedup factor of Halide code compared to plain C++ = %f\n", speedup)
    Speedup factor of Halide code compared to plain C++ = 6.454684
    cgLabels = categorical({'Plain C++','Halide'});
    runtimeTicks = [1, speedup];
    bar(cgLabels, runtimeTicks);
    ylabel("Ratio of Halide execution speed to C++");
    title("Comparing runtime performance between Halide and plain C++ generated code");

    Comparing runtime performance between Halide and plain C++ generated code.

    The simulation was run on AMD EPYC™ 74F3 24-Core Processor @ 3.19 GHz test system. For the HalideMLFB model, Halide code is approximately 6.4 times faster than the plain C++ code.

Note

Halide code significantly improves the execution speed for operations involving contiguous large multidimensional arrays. It might not perform well for smaller arrays. Based on the dimension size of the array, Embedded Coder decides whether to generate Halide code or plain C/C++ code for a model.

Embedded Coder also supports generating Halide code for tensor multiplication operations used in deep learning networks. To see the Halide code performance for tensor multiplication operation, you can use the HalideTensorMultiply model and compare the difference in code execution speed.

Related Topics