how can I handel imbalanced datset of images folders to apply for traning?

조회 수: 1 (최근 30일)
hume
hume 2023년 2월 11일
답변: Vinayak Choyyan 2023년 2월 13일
I have two classes of image datset with folder 1 and folder 2.i want to apply random oversampling over the minority class to make it equal to the majority classes.and save the output of the images in to a new folder which is resampled.I have used the method below and attached the code. but it displayes a matlab error.is there any thing you suggest me to modify my code or use other simple techniques?
% Load the images from folder1
folder1 = 'C:\Users\Degebassa\Documents\MATLAB\train\train_normal';
files1 = dir(fullfile(folder1,'*.png'));
numFiles1 = numel(files1);
images1 = cell(numFiles1,1);
for i = 1:numFiles1
filename = fullfile(folder1,files1(i).name);
images1{i} = imread(filename);
end
% Load the images from folder2
folder2 = 'C:\Users\Degebassa\Documents\MATLAB\train\train_abnormal';
files2 = dir(fullfile(folder2,'*.png'));
numFiles2 = numel(files2);
images2 = cell(numFiles2,1);
for i = 1:numFiles2
filename = fullfile(folder2,files2(i).name);
images2{i} = imread(filename);
end
% Find the size of the minority class and determine the number of samples to add
minClass = min(numFiles1, numFiles2);
numToAdd = numFiles1 - minClass;
% Use the resample function to randomly oversample the minority class
indices = resample(1:numFiles2, numToAdd, 'Replace', true);
% Use the indices to add the new samples to the minority class array
images2 = images2(indices);
% Combine the two class arrays back into one data matrix
images = [images1; images2];
% Define the output folder
outputFolder = 'C:\Users\Degebassa\Documents\MATLAB\resampled images';
% Save the balanced images to disk
for i = 1:numel(images)
imwrite(images{i}, fullfile(outputFolder, sprintf('image%d.png', i)));
end
here is the errod it displayes
Undefined function 'resample' for input arguments of type 'double'.
Error in untitled (line 24)
indices = resample(1:numFiles2, numToAdd, 'Replace', true);

답변 (1개)

Vinayak Choyyan
Vinayak Choyyan 2023년 2월 13일
Hello hume,
As per my understanding, you have an imbalanced data set with two classes, and you would like to balance it. There are a few ways of achieving the results you are looking for.
  • Over sampling with repetition or under sampling by deletion
Like you tried to do in the code you provided, we can simply repeat the images fed into the network. A good example code to do the same can be found here Oversampling for deep learning: classification example - File Exchange - MATLAB Central (mathworks.com)
  • Weighted Class Approach
Each class can be assigned a weight as per their occurrence probability. This weight will help in balancing out the imbalance in number of samples by giving more importance to the minority class during training. You can read more about this approach here Handle Imbalanced Data or Unequal Misclassification Costs in Classification Ensembles - MATLAB & Simulink - MathWorks India
  • Augmentation approach
I also notice you are reading your images one by one. It would be more efficient to use an ‘imageDatastore’ to load the images in batches as per your systems memory. This way you would not need to hold your entire dataset in memory, which would have taken up a lot of memory or not even fit completely into memory.
Another advantage of using ‘imageDatastore’ is that you can use the ‘augmentedImageDatastore’ function along with it to augment your images. This ensures that your model is not seeing the same type of images again and again. In MATLAB, these functions do not over sample but through augmentation, help achieve better training results. You can read more about ‘augmentedImageDatastore’ here Transform batches to augment image data - MATLAB - MathWorks India and about ‘imageDatastore’ here Datastore for image data - MATLAB - MathWorks India
  • Using SMOTE
Synthetic Minority Over-Sampling Technique or more commonly known as SMOTE can be used to over sample data points. This method is usually not used with images but if you would like to read more about SMOTE, please check out the following example Oversampling Imbalanced Data: SMOTE related algorithms - File Exchange - MATLAB Central (mathworks.com).
I also found this blog explaining solutions to the issue you are facing. I hope these approaches helps resolve the issue you are facing.

카테고리

Help CenterFile Exchange에서 Image Segmentation and Analysis에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by