deploy
Class: dlhdl.Workflow
Namespace: dlhdl
Deploy the specified neural network to the target FPGA board
Syntax
Description
deploy( programs the specified
        target board with the bitstream and deploys the deep learning network on it. If you call the
          workflowObject)deploy method on a non-compiled dlhdl.Workflow object,
        the deploy method first calls the compile method. If you
        have compiled the dlhdl.Workflow object, the deploy
        method, then:
- Programs the FPGA bitstream 
- Reboots the onboard Linux® to load the bitstream and Linux device tree when using the Ethernet interface 
- Transfer the weights and instructions data to the on-board memory 
- Configures the deep learning processor IP core through the AXI4 interface 
Input Arguments
Deep learning network deployment options, specified as a
                dlhdl.Workflow object.
Examples
This example shows how to create, compile, and deploy a dlhdl.Workflow object that has a handwritten character detection series network object by using the Deep Learning HDL Toolbox™ Support Package for Intel® FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device. 
Prerequisites
- Intel Arria® 10 SoC development kit 
Load the Pretrained SeriesNetwork
To load the pretrained network, that has been trained on the Modified National Institute Standards of Technology (MNIST) database[1], enter:
net = getDigitsNetwork;
To view the layers of the pretrained series network, enter:
analyzeNetwork(net)
Create Target Object
Create a target object that has a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG, install Intel Quartus® Prime Standard Edition 22.1. Set up the path to your installed Intel Quartus Prime executable if it is not already set up. For example, to set the toolpath, enter:
% hdlsetuptoolpath('ToolName', 'Altera Quartus II','ToolPath', 'C:\altera\22.1\quartus\bin64');hTarget = dlhdl.Target('Intel')hTarget = 
  Target with properties:
       Vendor: 'Intel'
    Interface: JTAG
Create Workflow Object
Create an object of the dlhdl.Workflow class. When you create the object, specify the network and the bitstream name. Specify the saved pretrained MNIST neural network, snet, as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Intel Arria 10 SOC board and the bitstream uses a single data type. 
hW = dlhdl.Workflow('network', net, 'Bitstream', 'arria10soc_single','Target',hTarget);
Compile the MNIST Series Network
To compile the MNIST series network, run the compile function of the dlhdl.Workflow object.
dn = hW.compile;
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream arria10soc_single.
### An output layer called 'Output1_softmax' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
### The network includes the following layers:
     1   'imageinput'        Image Input         28×28×1 images with 'zerocenter' normalization                (SW Layer)
     2   'conv_1'            2-D Convolution     8 3×3×1 convolutions with stride [1  1] and padding 'same'    (HW Layer)
     3   'relu_1'            ReLU                ReLU                                                          (HW Layer)
     4   'maxpool_1'         2-D Max Pooling     2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     5   'conv_2'            2-D Convolution     16 3×3×8 convolutions with stride [1  1] and padding 'same'   (HW Layer)
     6   'relu_2'            ReLU                ReLU                                                          (HW Layer)
     7   'maxpool_2'         2-D Max Pooling     2×2 max pooling with stride [2  2] and padding [0  0  0  0]   (HW Layer)
     8   'conv_3'            2-D Convolution     32 3×3×16 convolutions with stride [1  1] and padding 'same'  (HW Layer)
     9   'relu_3'            ReLU                ReLU                                                          (HW Layer)
    10   'fc'                Fully Connected     10 fully connected layer                                      (HW Layer)
    11   'softmax'           Softmax             softmax                                                       (SW Layer)
    12   'Output1_softmax'   Regression Output   mean-squared-error                                            (SW Layer)
                                                                                                             
### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_softmax' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv_1>>maxpool_2 ...
### Compiling layer group: conv_1>>maxpool_2 ... complete.
### Compiling layer group: conv_3>>relu_3 ...
### Compiling layer group: conv_3>>relu_3 ... complete.
### Compiling layer group: fc ...
### Compiling layer group: fc ... complete.
### Allocating external memory buffers:
          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________
    "InputDataOffset"           "0x00000000"     "368.0 kB"       
    "OutputResultOffset"        "0x0005c000"     "4.0 kB"         
    "SchedulerDataOffset"       "0x0005d000"     "220.0 kB"       
    "SystemBufferOffset"        "0x00094000"     "76.0 kB"        
    "InstructionDataOffset"     "0x000a7000"     "28.0 kB"        
    "ConvWeightDataOffset"      "0x000ae000"     "28.0 kB"        
    "FCWeightDataOffset"        "0x000b5000"     "100.0 kB"       
    "EndOffset"                 "0x000ce000"     "Total: 824.0 kB"
### Network compilation complete.
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Intel Arria 10 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
hW.deploy
### Programming FPGA Bitstream using JTAG... ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 18-Jul-2024 10:54:36 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 18-Jul-2024 10:54:37
Run Prediction for Example Image
To load the example image, execute the predict function of the dlhdl.Workflow object, and then display the FPGA result, enter: 
inputImg = imread('five_28x28.pgm'); inputImg = dlarray(single(inputImg), 'SSCB');
Run prediction with the profile 'on' to see the latency and throughput results.
[prediction, speed] = hW.predict(inputImg,'Profile','on');
### Finished writing input activations.
### Running single input activation.
              Deep Learning Processor Profiler Performance Results
                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                      31905                  0.00021                       1              32854           4565.7
    imageinput_norm           2913                  0.00002 
    conv_1                    6819                  0.00005 
    maxpool_1                 4493                  0.00003 
    conv_2                    5200                  0.00003 
    maxpool_2                 3549                  0.00002 
    conv_3                    6045                  0.00004 
    fc                        2854                  0.00002 
 * The clock frequency of the DL processor is: 150MHz
[val, idx] = max(prediction);
fprintf('The prediction result is %d\n', idx-1);The prediction result is 5
Bibliography
- LeCun, Y., C. Cortes, and C. J. C. Burges. "The MNIST Database of Handwritten Digits." https://yann.lecun.com/exdb/mnist/. 
This example uses:
- Deep Learning HDL ToolboxDeep Learning HDL Toolbox
- Deep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC DevicesDeep Learning HDL Toolbox Support Package for Xilinx FPGA and SoC Devices
- Deep Learning Toolbox Model Compression LibraryDeep Learning Toolbox Model Compression Library
- Deep Learning ToolboxDeep Learning Toolbox
- Computer Vision ToolboxComputer Vision Toolbox
- Deep Learning Toolbox Model for ResNet-18 NetworkDeep Learning Toolbox Model for ResNet-18 Network
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA. In the example you use the pretrained ResNet-18 CNN to perform transfer learning and quantization. You then deploy the quantized network and use MATLAB® to retrieve the prediction results.
ResNet-18 has been trained on over a million images and can classify images into 1000 object categories, such as keyboard, coffee mug, pencil, and many animals. The network has learned rich feature representations for a wide range of images. The network takes an image as input and outputs a label for the object in the image together with the probabilities for each of the object categories.
To perform classification on a new set of images, you fine-tune a pretrained ResNet-18 CNN by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights. You can quickly transfer learned features to a new task using a smaller number of training images.
Load Pretrained Network
Load the pretrained ResNet-18 network.
net = imagePretrainedNetwork("resnet18");View the layers of the pretrained network.
deepNetworkDesigner(net);
The first layer, the image input layer, requires input images of size 227-by-227-by-3, where three is the number of color channels.
inputSize = net.Layers(1).InputSize;
Load Data
This example uses the MathWorks® MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch). 
curDir = pwd; unzip('MerchData.zip'); imds = imageDatastore('MerchData', ... 'IncludeSubfolders',true, ... 'LabelSource','foldernames');
Specify Training and Validation Sets
Divide the data into training and validation data sets, so that 30% percent of the images go to the training data set and 70% of the images to the validation data set. splitEachLabel splits the datastore imds into two new datastores, imdsTrain and imdsValidation.
[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');Replace Final layers
To retrain ResNet-18 to classify new images, replace the last fully connected layer of the network. In ResNet-18 , this layer has the name 'fc1000'. The fully connected layer of the pretrained network net is configured for 1000 classes. This layer, fc1000 in ResNet-18, contains information on how to combine the features that the network extracts into class probabilities. The layer must be fine-tuned  for the new classification problem. Extract all the layers, except the last layer, from the pretrained network.
numClasses = numel(categories(imdsTrain.Labels))
numClasses = 5
newLearnableLayer = fullyConnectedLayer(numClasses, ... 'Name','new_fc', ... 'WeightLearnRateFactor',10, ... 'BiasLearnRateFactor',10); net = replaceLayer(net,'fc1000',newLearnableLayer);
Prepare Data for Training
The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images, such as randomly flipping the training images along the vertical axis and randomly translating them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from overfitting and memorizing the exact details of the training images.
pixelRange = [-30 30]; imageAugmenter = imageDataAugmenter( ... 'RandXReflection',true, ... 'RandXTranslation',pixelRange, ... 'RandYTranslation',pixelRange);
To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ... 'DataAugmentation',imageAugmenter); augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);
Specify Training Options
Specify the training options. For transfer learning, keep the features  from the early layers of the pretrained network (the transferred layer  weights). To slow down learning in the transferred layers, set the  initial learning rate to a small value. Specify the mini-batch size and  validation data. The software validates the network every ValidationFrequency iterations during training.
options = trainingOptions('sgdm', ... 'MiniBatchSize',10, ... 'MaxEpochs',6, ... 'InitialLearnRate',1e-4, ... 'Shuffle','every-epoch', ... 'ValidationData',augimdsValidation, ... 'ValidationFrequency',3, ... 'Verbose',false, ... 'Plots','training-progress');
Train Network
Train the network that consists of the transferred and new layers. By default, trainnet uses a GPU if one is available. Using this function on a GPU requires Parallel Computing Toolbox™  and a supported GPU device. For more information, see GPU Computing Requirements (Parallel Computing Toolbox). If a GPU is not available, the network uses a CPU (requires MATLAB Coder™ Interface for Deep Learning). You can also specify the execution environment by using the ExecutionEnvironment name-value argument of trainingOptions.
netTransfer = trainnet(augimdsTrain, net, 'crossentropy', options)
netTransfer = 
  dlnetwork with properties:
         Layers: [70×1 nnet.cnn.layer.Layer]
    Connections: [77×2 table]
     Learnables: [82×3 table]
          State: [40×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1
  View summary with summary.
Quantize Network
Quantize the network using the dlquantizer object. Set the target execution environment to FPGA.
dlquantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');
Calibrate Quantized Network
Use the calibrate function to exercise the network with sample inputs and collect the range information. The calibrate function collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns the information as a table, in which each row contains range information for a learnable parameter of the quantized network.
calibrate(dlquantObj,augimdsTrain)
ans=94×5 table
       Optimized Layer Name       Network Layer Name    Learnables / Activations    MinValue    MaxValue
    __________________________    __________________    ________________________    ________    ________
    {'conv1_Weights'         }    {'conv1'         }           "Weights"            -0.64526    0.89818 
    {'conv1_Bias'            }    {'conv1'         }           "Bias"               -0.64035    0.68777 
    {'res2a_branch2a_Weights'}    {'res2a_branch2a'}           "Weights"            -0.39021    0.33934 
    {'res2a_branch2a_Bias'   }    {'res2a_branch2a'}           "Bias"               -0.79958     1.2763 
    {'res2a_branch2b_Weights'}    {'res2a_branch2b'}           "Weights"            -0.75631    0.57786 
    {'res2a_branch2b_Bias'   }    {'res2a_branch2b'}           "Bias"                -1.3255     1.7421 
    {'res2b_branch2a_Weights'}    {'res2b_branch2a'}           "Weights"            -0.31048    0.33465 
    {'res2b_branch2a_Bias'   }    {'res2b_branch2a'}           "Bias"                -1.1135     1.4752 
    {'res2b_branch2b_Weights'}    {'res2b_branch2b'}           "Weights"             -1.1498    0.93351 
    {'res2b_branch2b_Bias'   }    {'res2b_branch2b'}           "Bias"               -0.84473     1.2549 
    {'res3a_branch2a_Weights'}    {'res3a_branch2a'}           "Weights"            -0.19053    0.24578 
    {'res3a_branch2a_Bias'   }    {'res3a_branch2a'}           "Bias"               -0.53824    0.68652 
    {'res3a_branch2b_Weights'}    {'res3a_branch2b'}           "Weights"            -0.54176    0.73187 
    {'res3a_branch2b_Bias'   }    {'res3a_branch2b'}           "Bias"               -0.68422     1.1596 
    {'res3a_branch1_Weights' }    {'res3a_branch1' }           "Weights"            -0.61816    0.95464 
    {'res3a_branch1_Bias'    }    {'res3a_branch1' }           "Bias"               -0.95381    0.77789 
      ⋮
Define FPGA Board Interface
Define the target FPGA board programming interface by using the dlhdl.Target object. Create a programming interface with custom name for your target device and an Ethernet interface to connect the target device to the  host computer.
hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');
Prepare Network for Deployment
Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the  bitstream name matches the data type and the FPGA board that you are  targeting. In this example, the target FPGA board is the Xilinx® Zynq®  UltraScale+™ MPSoC ZCU102 board and the bitstream uses the int8 data  type.
hW = dlhdl.Workflow(Network=dlquantObj,Bitstream='zcu102_int8',Target=hTarget);Compile Network
 Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.
dn = compile(hW,'InputFrameNumberLimit',15)### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv1>>pool1 ...
### Compiling layer group: conv1>>pool1 ... complete.
### Compiling layer group: res2a_branch2a>>res2a_branch2b ...
### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete.
### Compiling layer group: res2b_branch2a>>res2b_branch2b ...
### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete.
### Compiling layer group: res3a_branch1 ...
### Compiling layer group: res3a_branch1 ... complete.
### Compiling layer group: res3a_branch2a>>res3a_branch2b ...
### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete.
### Compiling layer group: res3b_branch2a>>res3b_branch2b ...
### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete.
### Compiling layer group: res4a_branch1 ...
### Compiling layer group: res4a_branch1 ... complete.
### Compiling layer group: res4a_branch2a>>res4a_branch2b ...
### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete.
### Compiling layer group: res4b_branch2a>>res4b_branch2b ...
### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete.
### Compiling layer group: res5a_branch1 ...
### Compiling layer group: res5a_branch1 ... complete.
### Compiling layer group: res5a_branch2a>>res5a_branch2b ...
### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete.
### Compiling layer group: res5b_branch2a>>res5b_branch2b ...
### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete.
### Compiling layer group: pool5 ...
### Compiling layer group: pool5 ... complete.
### Compiling layer group: new_fc ...
### Compiling layer group: new_fc ... complete.
### Allocating external memory buffers:
          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________
    "InputDataOffset"           "0x00000000"     "5.7 MB"        
    "OutputResultOffset"        "0x005be000"     "4.0 kB"        
    "SchedulerDataOffset"       "0x005bf000"     "712.0 kB"      
    "SystemBufferOffset"        "0x00671000"     "1.6 MB"        
    "InstructionDataOffset"     "0x007fe000"     "1.2 MB"        
    "ConvWeightDataOffset"      "0x00936000"     "13.5 MB"       
    "FCWeightDataOffset"        "0x016ab000"     "12.0 kB"       
    "EndOffset"                 "0x016ae000"     "Total: 22.7 MB"
### Network compilation complete.
dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]
Program Bitstream onto FPGA and Download Network Weights
To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.
deploy(hW)
### Programming FPGA Bitstream using Ethernet... ### Attempting to connect to the hardware board at 172.21.88.150... ### Connection successful ### Programming FPGA device on Xilinx SoC hardware board at 172.21.88.150... ### Attempting to connect to the hardware board at 172.21.88.150... ### Connection successful ### Copying FPGA programming files to SD card... ### Setting FPGA bitstream and devicetree for boot... # Copying Bitstream zcu102_int8.bit to /mnt/hdlcoder_rd # Set Bitstream to hdlcoder_rd/zcu102_int8.bit # Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd # Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb # Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM' ### Programming done. The system will now reboot for persistent changes to take effect. ### Rebooting Xilinx SoC at 172.21.88.150... ### Reboot may take several seconds... ### Attempting to connect to the hardware board at 172.21.88.150... ### Connection successful ### Programming the FPGA bitstream has been completed successfully. ### Loading weights to Conv Processor. ### Conv Weights loaded. Current time is 30-Aug-2024 11:31:45 ### Loading weights to FC Processor. ### FC Weights loaded. Current time is 30-Aug-2024 11:31:45
Test Network
Load the example image.
imgFile = fullfile(pwd,'MathWorks_cube_0.jpg');
inputImg = imresize(imread(imgFile),[224 224]);
imshow(inputImg)
Classify the image on the FPGA by using the predict method of the dlhdl.Workflow object and display the results.
inputImg = dlarray(single(inputImg), 'SSCB'); [prediction,speed] = predict(hW,inputImg,'Profile','on');
### Finished writing input activations.
### Running single input activation.
              Deep Learning Processor Profiler Performance Results
                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                    7405555                  0.02962                       1            7408123             33.7
    conv1                  1115422                  0.00446 
    pool1                   200956                  0.00080 
    res2a_branch2a          270504                  0.00108 
    res2a_branch2b          270422                  0.00108 
    res2a                   109005                  0.00044 
    res2b_branch2a          270206                  0.00108 
    res2b_branch2b          270217                  0.00108 
    res2b                   109675                  0.00044 
    res3a_branch1           155444                  0.00062 
    res3a_branch2a          156720                  0.00063 
    res3a_branch2b          245246                  0.00098 
    res3a                    54876                  0.00022 
    res3b_branch2a          245640                  0.00098 
    res3b_branch2b          245443                  0.00098 
    res3b                    55736                  0.00022 
    res4a_branch1           135533                  0.00054 
    res4a_branch2a          136768                  0.00055 
    res4a_branch2b          238039                  0.00095 
    res4a                    27602                  0.00011 
    res4b_branch2a          237950                  0.00095 
    res4b_branch2b          238645                  0.00095 
    res4b                    27792                  0.00011 
    res5a_branch1           324200                  0.00130 
    res5a_branch2a          326074                  0.00130 
    res5a_branch2b          623097                  0.00249 
    res5a                    13961                  0.00006 
    res5b_branch2a          623607                  0.00249 
    res5b_branch2b          624000                  0.00250 
    res5b                    13621                  0.00005 
    pool5                    36826                  0.00015 
    new_fc                    2141                  0.00001 
 * The clock frequency of the DL processor is: 250MHz
scores2label(prediction,categories(imdsTrain.Labels))
ans = categorical
     MathWorks Cube 
Performance Comparison
Compare the performance of the quantized network to the performance of the single data type network.
optionsFPGA = dlquantizationOptions('Bitstream','zcu102_int8','Target',hTarget, 'MetricFcn', {@(x)computeClassificationAccuracy(x,imdsValidation)}); predictionFPGA = validate(dlquantObj,imdsValidation,optionsFPGA)
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_int8.
### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### The network includes the following layers:
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv1>>pool1 ...
### Compiling layer group: conv1>>pool1 ... complete.
### Compiling layer group: res2a_branch2a>>res2a_branch2b ...
### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete.
### Compiling layer group: res2b_branch2a>>res2b_branch2b ...
### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete.
### Compiling layer group: res3a_branch1 ...
### Compiling layer group: res3a_branch1 ... complete.
### Compiling layer group: res3a_branch2a>>res3a_branch2b ...
### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete.
### Compiling layer group: res3b_branch2a>>res3b_branch2b ...
### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete.
### Compiling layer group: res4a_branch1 ...
### Compiling layer group: res4a_branch1 ... complete.
### Compiling layer group: res4a_branch2a>>res4a_branch2b ...
### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete.
### Compiling layer group: res4b_branch2a>>res4b_branch2b ...
### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete.
### Compiling layer group: res5a_branch1 ...
### Compiling layer group: res5a_branch1 ... complete.
### Compiling layer group: res5a_branch2a>>res5a_branch2b ...
### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete.
### Compiling layer group: res5b_branch2a>>res5b_branch2b ...
### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete.
### Compiling layer group: pool5 ...
### Compiling layer group: pool5 ... complete.
### Compiling layer group: new_fc ...
### Compiling layer group: new_fc ... complete.
### Allocating external memory buffers:
          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________
    "InputDataOffset"           "0x00000000"     "11.5 MB"       
    "OutputResultOffset"        "0x00b7c000"     "4.0 kB"        
    "SchedulerDataOffset"       "0x00b7d000"     "720.0 kB"      
    "SystemBufferOffset"        "0x00c31000"     "1.6 MB"        
    "InstructionDataOffset"     "0x00dbe000"     "1.2 MB"        
    "ConvWeightDataOffset"      "0x00ef6000"     "13.5 MB"       
    "FCWeightDataOffset"        "0x01c6b000"     "12.0 kB"       
    "EndOffset"                 "0x01c6e000"     "Total: 28.4 MB"
### Network compilation complete.
### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 30-Aug-2024 11:33:06
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 30-Aug-2024 11:33:06
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### Finished writing input activations.
### Running single input activation.
### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'data' of type 'ImageInputLayer' is split into an image input layer 'data', an addition layer 'data_norm_add', and a multiplication layer 'data_norm' for hardware normalization.
### The network includes the following layers:
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
              Deep Learning Processor Estimator Performance Results
                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   23781318                  0.10810                       1           23781318              9.3
    data_norm_add           268453                  0.00122 
    data_norm               163081                  0.00074 
    conv1                  2164700                  0.00984 
    pool1                   515128                  0.00234 
    res2a_branch2a          966477                  0.00439 
    res2a_branch2b          966477                  0.00439 
    res2a                   268453                  0.00122 
    res2b_branch2a          966477                  0.00439 
    res2b_branch2b          966477                  0.00439 
    res2b                   268453                  0.00122 
    res3a_branch1           541373                  0.00246 
    res3a_branch2a          541261                  0.00246 
    res3a_branch2b          920141                  0.00418 
    res3a                   134257                  0.00061 
    res3b_branch2a          920141                  0.00418 
    res3b_branch2b          920141                  0.00418 
    res3b                   134257                  0.00061 
    res4a_branch1           505453                  0.00230 
    res4a_branch2a          511309                  0.00232 
    res4a_branch2b          909517                  0.00413 
    res4a                    67152                  0.00031 
    res4b_branch2a          909517                  0.00413 
    res4b_branch2b          909517                  0.00413 
    res4b                    67152                  0.00031 
    res5a_branch1          1045581                  0.00475 
    res5a_branch2a         1052749                  0.00479 
    res5a_branch2b         2017485                  0.00917 
    res5a                    33582                  0.00015 
    res5b_branch2a         2017485                  0.00917 
    res5b_branch2b         2017485                  0.00917 
    res5b                    33582                  0.00015 
    pool5                    55746                  0.00025 
    new_fc                    2259                  0.00001 
 * The clock frequency of the DL processor is: 220MHz
### Finished writing input activations.
### Running single input activation.
predictionFPGA = struct with fields:
       NumSamples: 20
    MetricResults: [1×1 struct]
       Statistics: [2×7 table]
View the frames per second performance for the quantized network and single-data-type network. The quantized network has a performance of 33.8 frames per second compared to 9.3 frames per second for the single-data-type network. You can use quantization to improve your frames per second performance, however you could lose accuracy when you quantize your networks.
predictionFPGA.Statistics.FramesPerSecond
ans = 2×1
    9.2510
   33.7498
However, in this case you can observe the accuracy to be the same for both networks.
predictionFPGA.MetricResults.Result
ans=2×2 table
    NetworkImplementation    MetricOutput
    _____________________    ____________
     {'Floating-Point'}          0.9     
     {'Quantized'     }          0.9     
Helper Functions
function accuracy = computeClassificationAccuracy(fpgaOutput, validationData) % Compute accuracy of FPGA result compared to the Deep Learning Toolbox. % Copyright 2024 The MathWorks, Inc. fpgaClassifications = scores2label(fpgaOutput, categories(validationData.Labels)); fpgaClassifications = reshape(fpgaClassifications, size(validationData.Labels)); accuracy = sum(fpgaClassifications == validationData.Labels)./numel(fpgaClassifications); end
Version History
Introduced in R2020b
See Also
compile | getBuildInfo | predict | dlquantizer | dlquantizationOptions | calibrate | validate | predictAndUpdateState | resetState
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)