Code Generation for Interpolated FIR Filter on ARM Cortex-M Target using CMSIS
This example shows how to generate and run the optimized code using ARM™ Cortex-M CRL for an interpolated finite impulse response (IFIR) filter on the STM32F746G-Discovery hardware.
The interpolated finite impulse response (IFIR) filter provides a method for creating high order FIR filters more efficiently in terms of computation. IFIR filter uses multirate signal processing techniques to reduce the computational complexity of high order FIR filters. The model contains a Gaussian noise source block, an FIR Decimation block, a Discrete FIR Filter block, and an FIR Interpolation block.
The FIR Decimation block downsamples the input signal and reduces the sample rate. The lower order Discrete FIR Filter block filters the input signal at a reduced sample rate, thereby minimizing computational complexity. After the filtering stage, the FIR Interpolation block restores the signal to its original sample rate. This significantly lowers the number of multiplications during the convolution process, translating to a substantial reduction in computational burden.
Required Hardware
Cortex-M hardware - STM32F746G-Discovery board
Simulate Interpolated FIR Filter
The filter coefficients of the FIR Decimation, Discrete FIR Filter, and FIR Interpolation blocks are derived using the ifir
(DSP System Toolbox) function. The ifir
(DSP System Toolbox) function designs a periodic filter h(z)
, which provides the coefficients for the Discrete FIR Filter block. The ifir
function also designs an image-suppressor filter g(z)
, which provides the coefficients for the FIR Decimation and FIR Interpolation blocks shown in the below model.
The cascade of these filters represents the optimal minimax FIR approximation of the desired response.
Design h(z) and g(z) for a Low Pass Filter Response
Set the pass band peak ripple or deviation to 0.005dB, stop band peak ripple or deviation to 80dB, interpolation factor to 7, pass band edge frequency to 0.1 π rad/sample and stop band edge frequency to 0.101 π rad/sample.
Apass = 0.005; % dB Astop = 80; % dB Fstop = .101; M = 7; F = [.1 Fstop];
Execute the below commands to convert the pass band and stop band ripple from dB to linear scale and to design the h(z)
and g(z)
filters, thereby deriving the filter coefficients.
A= [convertmagunits(Apass,'db','linear','pass') convertmagunits(Astop,'db','linear','stop')]; [h,g] = ifir(M,'low',F,A);
Ensure that the commands to compute h(z)
and g(z)
are set in the PreLoadFcn
of the model. To open PreLoadFcn
, follow the below steps:
In the Simulink Toolstrip, on the Modeling tab, in the Design gallery, click Property Inspector.
With no selection at the top level of the model, on the Properties tab, in the Callbacks section, select
PreLoadFcn
.
To open the model, execute the following command:
open_system('stm32f746g_ifir_filter');
The magnitude of IFIR Filter's output can be observed in the Spectrum Analyzer. For execution in host simulation, click on 'Run' in the Simulation tab. The default simulation time is set to 1 sec.
Configure the Model
You can configure the model either interactively, using the Configuration Parameters in Simulink®, or programmatically, using the MATLAB® programming interface.
Interactive Approach
Configure the model for code generation targeting ARM Cortex-M hardware.
Press Ctrl+E (Model settings) to open Configuration Parameters dialog box. (or) Open the Modeling tab and select Model Settings from the model toolstrip.
Go to Hardware Implementation > Hardware board and select STM32F746G-Discovery
Navigate to Hardware Implementation > Code Generation and perform the following configurations.
Code Generation > System target file to ert.tlc
Code Generation > Build Configuration to Faster Runs
Code Generation > Interface > Code replacement libraries include ARM Cortex-M
Code Generation > Report > enable Create code generation report
Code Generation > Report > enable Open report automatically
Code Generation > Report > enable Summarize which blocks triggered code replacements
Programmatic Approach
Execute the following commands to configure the Simulink model stm32f746g_ifir_filter.slx
for deployment on the STM32F746G-Discovery board. Select ert.tlc
as the system target file to optimize the code for embedded real-time systems, and choose Faster Runs
for the build configuration to prioritize execution speed.
set_param('stm32f746g_ifir_filter','HardwareBoard','STM32F746G-Discovery'); set_param('stm32f746g_ifir_filter','SystemTargetFile','ert.tlc'); set_param('stm32f746g_ifir_filter','BuildConfiguration','Faster Runs');
Set the code replacement library as ARM Cortex-M
to generate the optimized code for ARM Cortex-M hardware.
set_param('stm32f746g_ifir_filter','CodeReplacementLibrary','ARM Cortex-M');
Finally, enable the generation of detailed code replacement reports. These reports provide valuable insights into the code structure and optimizations, facilitating a deeper understanding of the deployment process.
set_param('stm32f746g_ifir_filter','GenerateReport','on'); set_param('stm32f746g_ifir_filter','GenerateCodeReplacementReport','on');
Generate Code
To initiate code generation and build process for the model, press Ctrl+B or click Build on the Generate Code tab of Embedded Coder App.
Once the code is generated, click View Code to view the generated code.
You can also verify the code replacements using the Open Report > Code Replacement Report option.
Verify on Target using SIL/PIL Manager
To verify the numerical accuracy of the generated code against the simulated output, you can either use the SIL/PIL Manager app or run the model programmatically in processor in the loop (PIL) mode.
To use the SIL/PIL Manager app, follow these steps:
Go to SIL/PIL Manager
Set Mode to Automated Verification
Set the SIL/PIL Mode to Processor-in-loop (PIL)
Click Run Verification
Execute the commands below to run the model programmatically in PIL mode.
set_param('stm32f746g_ifir_filter','SimulationMode','processor-in-the-loop (pil)'); outputWithCRL = sim('stm32f746g_ifir_filter.slx');
The numerical accuracy can be verified using the Simulation Data Inspector.
Set the tolerance (Absolute or Relative) under [+] More section in the Simulink Data Inspector window.
From the plot, observe that the simulation output is overlapping with PIL output (subplot-1), and the absolute sample tolerance is less than 1e-6(subplot-2).
Compare Performance
Compare the performance of a particular block with plain C code (without CRL) and CMSIS code (with CRL).
Enable code profiling
To enable code profiling, follow these steps:
Go to Model Settings, check the Code Generation > Verification > Measure task execution time.
Set Code Generation > Verification > Measure function execution times to Detailed. (You can select the Coarse option if you are looking for overall model performance).
Redo the Run Verification from SIL/PIL.
Obtain profiling information
To obtain profiling information, you can use Code Execution Profiling Report or Code Profile Analyzer.
Code Execution Profiling Report
To obtain profiling information using Code Execution Profiling Report, follow these steps:
From SIL/PIL > RESULTS select Generate Report.
Under 'Profiled Sections of Code', you can find the execution time obtained for each of the model function.
In this section, verify the block profiling. Click on the MATLAB Icon next to the stm32f746g_ifir_filter_step.
ticksWithCRL = outputWithCRL.get('executionProfile').Sections(2).TotalExecutionTimeInTicks
ticksWithCRL = uint64
57287533
It can be seen that the 'TotalExecutionTimeInTicks' of step consumed (with ARM Cortex-M CRL enabled) is 57287533 cycles. Repeat the same without selecting any CRL from Model Settings (Code Generation > Interface > Code replacement libraries).
set_param('stm32f746g_ifir_filter','CodeReplacementLibrary','None'); outputWithoutCRL = sim('stm32f746g_ifir_filter.slx'); ticksWithoutCRL = outputWithoutCRL.get('executionProfile').Sections(2).TotalExecutionTimeInTicks
ticksWithoutCRL = uint64
94821678
speedUp = ticksWithoutCRL/ticksWithCRL
speedUp = uint64
2
bar(["ARM Cortex-M CRL","plain C"],[ticksWithCRL; ticksWithoutCRL],0.4) ylabel('Execution Time (ticks)'); title('Performance Comparison of ARM Cortex-M CRL vs. plain C');
The total cycles consumed by the step function without selecting the CRL is around 94821678 cycles. Thus, by using CRL, the performance improved by 2x times.
Code Profile Analyzer
To obtain profiling information using Code Profile Analyzer, follow these steps:
Under Analysis tab, select the Function Execution
In Function Execution Times, verify the Maximum and Average Execution time for the FIR Decimation, Discrete FIR Filter, and FIR Interpolation blocks. You can consider the average execution time that ran for the whole simulation.
Optionally, you can also verify the relative times among the self-time/caller or self-time/task.