# balanceBoxLabels

Balance bounding box labels for object detection

## Syntax

``locationSet = balanceBoxLabels(boxLabels,blockedImages,blockSize,numObservations)``
``locationSet = balanceBoxLabels(boxLabels,blockedImages,blockSize,numObservations,Name,Value)``

## Description

example

````locationSet = balanceBoxLabels(boxLabels,blockedImages,blockSize,numObservations)` balances bounding box labels, `boxLabels`, by oversampling blocks of images containing less frequent classes, contained in the collection of blocked image objects `blockedImages`. `numObservations` is the required number of block locations, and `blockSize` specifies the block size.```
````locationSet = balanceBoxLabels(boxLabels,blockedImages,blockSize,numObservations,Name,Value)` specifies additional aspects of the selected blocks using name-value arguments.```

## Examples

collapse all

Load box labels data that contains boxes and labels for one image. The height and width of each box is [20,20].

```d = load('balanceBoxLabelsData.mat'); boxLabels = d.BoxLabels;```

Create a blocked image of size `[500,500]`.

`blockedImages = blockedImage(zeros([500,500]));`

Choose the images size of each observation.

`blockSize = [50,50];`

Visualize using a histogram to identify any class imbalance in the box labels.

```blds = boxLabelDatastore(boxLabels); datasetCount = countEachLabel(blds); figure; h1 = histogram('Categories',datasetCount.Label,'BinCounts',datasetCount.Count)```
```h1 = Histogram with properties: Data: [0x0 categorical] Values: [1 1 1 1 1 1 1 1 1 1 1 11] NumDisplayBins: 12 Categories: {1x12 cell} DisplayOrder: 'manual' Normalization: 'count' DisplayStyle: 'bar' FaceColor: 'auto' EdgeColor: [0 0 0] Show all properties ```

Measure the distribution of box labels. If the coefficent of variation is more than 1, then there is class imbalance.

`cvBefore = std(datasetCount.Count)/mean(datasetCount.Count)`
```cvBefore = 1.5746 ```

Choose a heuristic value for number of observations by finding the mean of the counts of each class, multiplied by the number of classes.

```numClasses = height(datasetCount); numObservations = mean(datasetCount.Count) * numClasses;```

Control the amount a box can be cut using `OverlapThreshold`. Using a lower threshold value will cut objects more at the border of a block. Increase this value to reduce the amount an object can be clipped at the border, at the expense of a less balanced box labels.

`ThresholdValue = 0.5;`

Balance `boxLabels` using the `balanceBoxLabels` function.

```locationSet = balanceBoxLabels(boxLabels,blockedImages,blockSize,... numObservations,'OverlapThreshold',ThresholdValue);```
```Balancing box labels for 1 images with [==================================================] 100% [==================================================] 100% Balancing box labels complete. ```

Count the labels that are contained within the image blocks.

```bldsBalanced = boxLabelDatastore(boxLabels,locationSet); balancedDatasetCount = countEachLabel(bldsBalanced);```

Overlay another histogram against the original label count to see if the box labels are balanced. If the labels appear to be not balanced by looking at the histograms, increase the value for `numObservations`.

```hold on; balancedLabels = balancedDatasetCount.Label; balancedCount = balancedDatasetCount.Count; h2 = histogram('Categories',balancedLabels,'BinCounts',balancedCount); title(h2.Parent,"Balanced class labels (OverlapThreshold: " + ThresholdValue + ")" ); legend(h2.Parent,{'Before','After'});```

Measure the distribution of the new baanced box labels.

`cvAfter = std(balancedCount)/mean(balancedCount)`
```cvAfter = 0.4588 ```

## Input Arguments

collapse all

Labeled bounding box data, specified as a table with two columns.

• The first column contains bounding boxes and must be a cell vector. Each element in the cell vector contains M-by-4 matrices in the format [x, y, width, height] for M boxes.

• The second column must be a cell vector that contains the label names corresponding to each bounding box. Each element in the cell vector must be an M-by-1 categorical or string vector.

To create a box label table from ground truth data,

1. Use the Image Labeler or Video Labeler app to label your ground truth. Export the labeled ground truth data to your workspace.

2. Create a bounding box label datastore using the `objectDetectorTrainingData` function.

3. You can obtain the `boxLabels` from the `LabelData` property of the box label datastore returned by `objectDetectorTrainingData`, (` blds.LabelData`).

Labeled blocked images, specified as an array of `blockedImage` objects containing pixel label images.

Block size of read data, specified as a two-element row vector of positive integers, [numrows,numcols]. The first element specifies the number of rows in the block. The second element specifies the number of columns.

Number of block locations to return, specified as a positive integer.

### Name-Value Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: '`OverlapThreshold`',`'1'`

Resolution level of each image in the array of `blockedImage` objects, specified as a positive integer scalar or a B-by-1 vector of positive integers, where B is the length of the array of `blockedImage` objects.

Overlap threshold, specified as a positive scalar in the range [0,1]. When the overlap between a bounding box and a cropping window is greater than the threshold, boxes in the `boxLabels` input are clipped to the image block window border. When the overlap is less than the threshold, the boxes are discarded. When you lower the threshold, part of an object can get discarded. To reduce the amount an object can be clipped at the border, increase the threshold. Increasing the threshold can also cause less-balanced box labels.

The amount of overlap between the bounding box and a cropping window is defined as.

`$area\left(bboxA\cap window\right)/area\left(bboxA\right)$`

Display progress information, specified as a numeric or logical `1` (`true`) or `0` (`false`). Set this property to `true` to display information.

## Output Arguments

collapse all

Balanced box labels, returned as a `blockLocationSet` object. The object contains `numObservations` number of locations of balanced blocks, each of size `blockSize`.

## Algorithms

collapse all

### Balancing Box Labels

To balance box labels, the function over samples classes that are less represented in the blocked image or big image. The box labels are counted across the dataset and sorted based on each class count. Each image size is split into several quadrants, based on the `blockSize` input value. The algorithm randomly picks several blocks within each quadrant with less-represented classes. The blocks without any objects are discarded. The balancing stops once the specified number of blocks are selected.

### Checking for Balance

You can check the success of balancing by comparing the histograms of label count before and after balancing. You can also check the coefficient of variation value. For best results, the value should be less than the original value. For more information, see the National Institute of Standards and Technology (NIST) website, see Coefficient of Variation for more information.

## Compatibility Considerations

expand all

Not recommended starting in R2021a

Introduced in R2020a