Hi,
I have collected data (bboxes) of multiclass (2 classes let say 'cats' and 'dogs') object detection with Image labeler and I am going to collect more. I labeled the majority of object instances to my images and at the end the instances of 'cats' are twice the 'dogs'. There are images that have only 'cats', image that have only 'dogs' and images that have both. So my question is how I treat, practically in MATLAB this imbalance problem before CNN training.
The most obvious (and easy) to me is the Hard Sampling approach i.e. a) by removing images with only ‘cats’, b) deleting ‘cats’ boxes from images and c) identify only ‘dogs’ to new images. The above approaches can be accomplished by hand by me in Image labeler.
Another approach could be the above but through a random selection/deletion. How can this accomplished in MATLAB given a groundTruth object of the above imbalanced dataset?
Finally, I am thinking of applying the augmentation only to ‘cats’ class in order to have at the end a balanced dataset. Again how can this accomplished in MATLAB given a groundTruth object of the above imbalanced dataset?
Any help would be appreciated. I have 2022b but i can install newest if needed
C.