Main Content

Object-Level Fusion of Lidar and Camera Data for Vehicle Tracking

This example shows you how to generate an object-level fused track list from measurements of a lidar and multiple camera sensors using a joint integrated probabilistic data association (JIPDA) tracker.

Different sensors capture different characteristics of objects in their field of view and have the potential to complement each other. Fusing information from multiple sensors by using a centralized tracker generally provides better tracking results.

In this example you:

  • Generate 3D bounding box detections from lidar a sensor.

  • Generate 2D bounding box detections from multiple camera sensors.

  • Fuse detections from lidar and camera sensors and track targets using a trackerJPDA object.

Load and Visualize Sensor Data

Download a zip file containing a subset of sensor data from the PandaSet dataset and prerecorded object detections. The zip file contains multiple MAT-files, and each file has lidar and camera data for a timestamp. In this example, you use six cameras and a lidar mounted on the ego vehicle. Together all seven sensors provide 360-degree view around the ego vehicle. Pre-recorded detections generated using a pretrained yolov4ObjectDetector (Computer Vision Toolbox) object and a pointPillarsObjectDetector (Lidar Toolbox) object are loaded from a MAT-file. For more information on generating these detections, see the Object Detection Using YOLO v4 Deep Learning (Computer Vision Toolbox) and Lidar 3-D Object Detection Using PointPillars Deep Learning (Lidar Toolbox) examples.

Load the first frame of the data as dataLog into the workspace.

dataFolder = tempdir;
dataFileName = "";
url = "" + dataFileName;
filePath = fullfile(dataFolder,dataFileName);
if ~isfile(filePath)
dataPath = fullfile(dataFolder,"PandasetLidarCameraData") ;
fileName = fullfile(dataPath,strcat(num2str(1,"%03d"),".mat"));

% Set offset from starting time
tOffset = dataLog.LidarData.Timestamp;

Read the point cloud data and object detections from dataLog.

[ptCld,lidarBboxes] = helperExtractLidarData(dataLog);
ax = pcshow(ptCld);

Display the detected 3D bounding boxes on the point cloud data.


Figure contains an axes object. The axes object contains an object of type scatter.

Read the image and object detections from dataLog.

camIdx = 1; 
[img,cameraBBoxes] = helperExtractCameraData(dataLog,dataFolder,camIdx);

Annotate images with the detected 2D bounding boxes.

img = insertObjectAnnotation(img,"Rectangle",cameraBBoxes,"Vehicle");

Figure contains an axes object. The axes object contains an object of type image.

Setup the Tracker

The first step toward defining a tracker is to define a filter initialization function. You set the FilterInitializationFcn property to initLidarCameraFusionFilter. The function returns a trackingEKF object customized to track 3D bounding box detections from lidar and 2D bounding box detections from camera.

The filter defines the state of the objects, the model that governs the transition of state, and the model that generates the measurement based on the state. These two models are collectively known as the state-space model of the target. To model the state of vehicles for tracking using lidar and camera, this example uses a cuboid model with following convention:


skin refers to the portion of the state that controls the kinematics of the motion center, and θ is the yaw angle. The length(l), width(w), and height(h) of the cuboid are modeled as constants, but their estimates can change when the filter makes corrections based on detections. In this example, you use a constant turn-rate cuboid model, and the state definition is expanded as:


The filter uses the monoCamera (Automated Driving Toolbox) object to transform the track state to camera bounding box and vice-versa. The monoCamera object requires intrinsic and extrinsic camera parameters to transform between image coordinates and vehicle coordinates.

For more information about the state transition and measurement models, refer to the helperInitLidarCameraFusionFilter function.

You use helper functions helperAssembleLidarDetections and helperAssembleCameraDetections to convert the bounding box detections to objectDetection format required by the tracker.

% Setup the tracker 
tracker = trackerJPDA( ...
    AssignmentThreshold=[20 200],...

% Setup the visualization
display = helperLidarCameraFusionDisplay;

% Load GPS data
fileName = fullfile(dataPath,"gpsData.mat");

% Create ego trajectory from GPS data using waypointTrajectory
egoTrajectory = helperGenerateEgoTrajectory(gpsData); 

Fuse and Track Detections with Tracker

In this section you iterate over all the frames and assemble the detections from both sensors. You process the detections for each frame using the JIPDA tracker. Finally, you update the visualization with detections and tracks.

% Set the number of frames to process from the dataset
numFrames = 80;

for frame = 1:numFrames   
    % Load the data
    fileName = fullfile(dataPath,strcat(num2str(frame,"%03d"),".mat"));

    % Find current time
    time = dataLog.LidarData.Timestamp - tOffset;

    % Update ego pose using GPS data to track in global coordinate frame
    [pos, orient, vel] = egoTrajectory.lookupPose(time);
    egoPose.Position = pos;
    egoPose.Orientation = eulerd(orient,"ZYX","frame");
    egoPose.Velocity = vel;

    % Assemble lidar detections into objectDetection format
    [~, lidarBoxes, lidarPose] = helperExtractLidarData(dataLog);
    lidarDetections = helperAssembleLidarDetections(lidarBoxes,lidarPose,time,1,egoPose);

    % Assemble camera detections into objectDetection format
    cameraDetections = cell(0,1);
    for k = 1:1:numel(dataLog.CameraData)
        [img, camBBox,cameraPose] = helperExtractCameraData(dataLog, dataFolder,k);
        cameraBoxes{k} = camBBox; %#ok<SAGROW>
        thisCameraDetections = helperAssembleCameraDetections(cameraBoxes{k},cameraPose,time,k+1,egoPose);
        cameraDetections = [cameraDetections;thisCameraDetections]; %#ok<AGROW> 

    % Concatenate detections
    if frame == 1
        detections = lidarDetections;
        detections = [lidarDetections;cameraDetections];
    % Run the tracker
    tracks = tracker(detections, time);

    % Visualize the results
     display(dataFolder,dataLog, egoPose, lidarDetections, cameraDetections, tracks);

Figure contains 7 axes objects and other objects of type uipanel. Axes object 1 with xlabel X (m), ylabel Y (m) contains 17 objects of type scatter, text. These objects represent Camera Detections, Lidar Detections, Fused Tracks. Axes object 2 contains an object of type image. Axes object 3 contains an object of type image. Axes object 4 contains an object of type image. Axes object 5 contains an object of type image. Axes object 6 contains an object of type image. Axes object 7 contains an object of type image.


From the results above, fusing detections from different sensors provides better estimation of positions and dimensions of the targets present in the scenario. The picture below shows that the fused track bounding boxes in green color are tighter than the lidar and camera detected bounding boxes shown in yellow and blue colors, respectively. Also, the fused bounding boxes are more closer to the true objects.



In this example, you learned how to fuse lidar and camera detections and track them with a joint integrated probabilistic data association (JIPDA) tracker.