qnn.HTP

Interface to predict responses of deep learning model for QNN HTP backend

Since R2025b

Description

The qnn.HTP System object™ is an interface to predict responses of a deep learning model represented as a QNN model or QNN context binary for the HTP (NPU) backend of Qualcomm^® AI Direct Engine.

To create the interface to predict responses of QNN HTP:

Create the qnn.HTP object and set its properties.
Call the object with arguments, as if it were a function.

To learn more about how System objects work, see What Are System Objects?

You can deploy the code generated using the qnn.HTP System object to one of these boards that are available under the Hardware board parameter in Configuration Parameters:

Qualcomm Android Board
Qualcomm Linux Board
Qualcomm Hexagon Android Board, with Processor Version cDSP
Qualcomm Hexagon Linux Board, with Processor Version cDSP

Creation

Syntax

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.dll,QNNTargetModel=qnntargetmodel.so)

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.dll,QNNTargetModel=qnntargetmodel.so,DeQuantizeOutput=true)

qnnhtp = qnn.HTP("BINARY",QNNHostModel=qnnhostmodel.dll,QNNContextBinary=qnncontextbinary.bin)

qnnhtp = qnn.HTP("BINARY",QNNHostModel=qnnhostmodel.dll,QNNContextBinary=qnncontextbinary.bin,DeQuantizeOutput=true)

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.so,QNNTargetModel=qnntargetmodel.so)

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.so,QNNTargetModel=qnntargetmodel.so,DeQuantizeOutput=true)

qnnhtp = qnn.HTP("BINARY",QNNContextBinary=qnncontextbinary.bin)

qnnhtp = qnn.HTP("BINARY",QNNContextBinary=qnncontextbinary.bin,DeQuantizeOutput=true)

Description

Windows Host

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.dll,QNNTargetModel=qnntargetmodel.so) creates an interface to predict responses of QNN model (.dll for host and compiled shared object (.so) for target) for the HTP (NPU) backend.

example

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.dll,QNNTargetModel=qnntargetmodel.so,DeQuantizeOutput=true) creates an interface similar to the previous syntax and performs dequantization of the output.

qnnhtp = qnn.HTP("BINARY",QNNHostModel=qnnhostmodel.dll,QNNContextBinary=qnncontextbinary.bin) creates an interface to predict responses of QNN model (.dll for host and context binary file (.bin) for target) for the HTP (NPU) backend.

example

qnnhtp = qnn.HTP("BINARY",QNNHostModel=qnnhostmodel.dll,QNNContextBinary=qnncontextbinary.bin,DeQuantizeOutput=true) creates an interface similar to the previous syntax and performs dequantization of the output.

Linux Host

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.so,QNNTargetModel=qnntargetmodel.so) creates an interface to predict responses of QNN model (compiled shared objects (.so) for host and target) for the HTP (NPU) backend.

qnnhtp = qnn.HTP("QNN-Model",QNNHostModel=qnnhostmodel.so,QNNTargetModel=qnntargetmodel.so,DeQuantizeOutput=true) creates an interface similar to the previous syntax and performs dequantization of the output.

qnnhtp = qnn.HTP("BINARY",QNNContextBinary=qnncontextbinary.bin) creates an interface to predict responses of QNN model (context binary file (.bin) for the host and target) for the HTP (NPU) backend.

qnnhtp = qnn.HTP("BINARY",QNNContextBinary=qnncontextbinary.bin,DeQuantizeOutput=true) creates an interface similar to the previous syntax and performs dequantization of the output.

Properties

expand all

Unless otherwise indicated, properties are nontunable, which means you cannot change their values after calling the object. Objects lock when you call them, and the release function unlocks them.

If a property is tunable, you can change its value at any time.

For more information on changing property values, see System Design in MATLAB Using System Objects.

`DLNetworkFormat` — Format of deep learning network optimized to run on HTP (NPU) backend on the target
`QNN-Model` | `BINARY`

Format of deep learning network optimized to run on HTP (NPU) backend on the target, specified as a string.

Data Types: string

`QNNHostModel` — Name of QNN model on x-86 host
`"<filename>.so"` | `"<filename>.dll"`

Name of QNN model on x-86 host to perform inference, specified as either compiled shared object (.so) for Linux or .dll for Windows. For details on creating an QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

If the QNN model is not present in the current folder in MATLAB, specify the absolute path along with the filename.

Data Types: string

`QNNTargetModel` — Name of QNN model on target
`"<filename>.so"`

Name of QNN model on target to perform inference, specified as a string representing a compiled shared object (.so) file. For details on creating a QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

If the QNN model is not present in the current folder in MATLAB, specify the absolute path along with the filename.

Data Types: string

`QNNContextBinary` — Name of QNN context binary file on target
`"<filename>.bin"`

Name of QNN context binary file on the target to perform inference, specified as a string representing a .bin file. For details on creating a binary file to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

If the QNN context binary file is not present in the current folder in MATLAB, specify the absolute path along with the filename.

Data Types: string

`QNNBackendConfigFile` — Backend configuration filename
`"<filename>.json"`

Name of backend configuration file, specified as a character vector. Use this property to select a Qualcomm QNN-compliant json file that defines HTP backend-specific execution settings. These settings include memory allocation, precision mode, thread count, and performance profile.

`DequantizeOutput` — Option to use output dequantization to predict response
`false` (default) | `true`

Option to use output dequantization to predict response, specified as true or false. Set the value to true to dequantize the output after inference. This setting results in the output data type always being single, irrespective of the data type of deep learning neural network's output layer.

Data Types: logical

`SimulationTimeOut` — Maximum allowed time to simulate one step call
120 (default)

Maximum allowed time to simulate one step call, specified in seconds. This property is applicable only to MATLAB^® execution and Simulink^® - Normal and Accelerator mode simulations.

Data Types: double

Usage

Syntax

qnnresponse = qnnhtp(x)

Description

qnnresponse = qnnhtp(x) predicts responses for QNN HTP backend using qnnhtp System object, based on the input data, x

Instead of calling the System object directly, you can also use the predict function to obtain the response.

Input Arguments

expand all

`x` — Data input
`N`–dimensional array

Data input, specified as an N-dimensional array. The array must be of the same size as Input layer size of the QNN host model.

The System objects support multiple-input multiple-output tensor with a maximum of four dimensions, but the batch size must always be 1. For example, if the input layer of the original deep learning network is 128-by-128-by-3, the input signal dimension must be either 128-by-128-by-3 or 1-by-128-by-128-by-3.

If the leading dimensions are 1 (singleton dimensions), you can remove these dimensions without affecting compatibility. For example, if the input layer of an AI model expects an input of size 1-by-1-by-128-by-3, you can specify an input of size 1-by-1-by-128-by-3 or 128-by-3. You can remove these dimensions because dimensions of size 1 can be broadcast to match the expected shape.

The input data type must match the data type of the QNN network's input layer. Additionally, the input can be floating-point even for quantized QNN network.

Output Arguments

expand all

`qnnresponse` — Response from QNN HTP backend
`N`–dimensional array

The response after computing predictions using the selected QNN model , represented as an N-dimensional array. The output data types match the data type of the QNN network's output layers. If you set DequantizeOuptut to true, the output is always single.

The System objects support a multiple-input multiple-output tensor with a maximum of four dimensions, but the batch size must always be 1.

Object Functions

To use an object function, specify the System object as the first input argument. For example, to release system resources of a System object named obj, use this syntax:

release(obj)

expand all

Specific to `qnn.HTP`

predict Predict response based on given data using System objects created for QNN backends (HTP, CPU, or LPAI) or eNPU

Common to All System Objects

`release`	Release resources and allow changes to System object property values and input characteristics
`clone`	Create duplicate System object

Examples

collapse all

Create Interface to QNN HTP Backend by Using QNN-Model Format and Predict the Response

Prepare the QNN model for host and target, for the HTP backend. To create interface to HTP backend, it is recommended that you copy the files to the current folder in MATLAB. Alternatively, you can note the absolute path of the file.

Prepare the input data for inference. This example uses uniformly distributed random numbers of single data type.

x = rand(299,299,3,'single');

Create the QNN HTP interface object, by using QNN-Model as the DLNetworkLayer property value. This example first checks for the operating system (Linux or Windows) to use the appropriate model file. The host and target models defined using the variables must be present in the in the current folder in MATLAB.

if isunix
     QNNHostModel = "libnception.so" 
else
     QNNHostModel = "Inception.dll"
end
 
obj = qnn.HTP("QNN-Model",...
    QNNHostModel=QNNHostModel,...
    QNNTargetModel=libandroidIncpetion.so)

Predict the response by using the Predict function.

obj.predict(x);

Create Interface to QNN HTP Backend by Using BINARY Format and Predict the Response

Prepare the input data for inference. This example uses uniformly distributed random numbers of single data type.

x = rand(299,299,3,'single');

Create the QNN HTP interface object, by using BINARY as the DLNetworkLayer property value. This example first checks for the operating system (Linux or Windows) to use the appropriate model file. The host and target models defined using the variables must be present in the in the current folder in MATLAB.

if isunix
    obj = qnn.HTP("BINARY",...
        QNNContextBinary=inception.serialized.bin);
else
    obj = qnn.HTP("BINARY",...
        QNNHostModel=Inception.dll,...
        QNNContextBinary=inception.serialized.bin");
end

Predict the response by using the Predict function.

obj.predict(x);

Version History

Introduced in R2025b

qnn.HTP

Description

Creation

Syntax

Description

Windows Host

Linux Host

Properties

DLNetworkFormat — Format of deep learning network optimized to run on HTP (NPU) backend on the target QNN-Model | BINARY

QNNHostModel — Name of QNN model on x-86 host "<filename>.so" | "<filename>.dll"

QNNTargetModel — Name of QNN model on target "<filename>.so"

QNNContextBinary — Name of QNN context binary file on target "<filename>.bin"

QNNBackendConfigFile — Backend configuration filename "<filename>.json"

DequantizeOutput — Option to use output dequantization to predict response false (default) | true

SimulationTimeOut — Maximum allowed time to simulate one step call 120 (default)

Usage

Syntax

Description

Input Arguments

x — Data input N–dimensional array

Output Arguments

qnnresponse — Response from QNN HTP backend N–dimensional array

Object Functions

Specific to qnn.HTP

Common to All System Objects

Examples

Create Interface to QNN HTP Backend by Using QNN-Model Format and Predict the Response

Create Interface to QNN HTP Backend by Using BINARY Format and Predict the Response

Version History

See Also

`DLNetworkFormat` — Format of deep learning network optimized to run on HTP (NPU) backend on the target
`QNN-Model` | `BINARY`

`QNNHostModel` — Name of QNN model on x-86 host
`"<filename>.so"` | `"<filename>.dll"`

`QNNTargetModel` — Name of QNN model on target
`"<filename>.so"`

`QNNContextBinary` — Name of QNN context binary file on target
`"<filename>.bin"`

`QNNBackendConfigFile` — Backend configuration filename
`"<filename>.json"`

`DequantizeOutput` — Option to use output dequantization to predict response
`false` (default) | `true`

`SimulationTimeOut` — Maximum allowed time to simulate one step call
120 (default)

`x` — Data input
`N`–dimensional array

`qnnresponse` — Response from QNN HTP backend
`N`–dimensional array

Specific to `qnn.HTP`