Data partitioning for Machine learning

조회 수: 2 (최근 30일)
Akshita Gupta
Akshita Gupta 2019년 3월 30일
답변: Gagan Agarwal 2024년 5월 30일
what does the warning that the training set does not contain points from all groups in partitioning the data means ? And how can it be removed.

답변 (1개)

Gagan Agarwal
Gagan Agarwal 2024년 5월 30일
Hi Akshita
The warning that the training set does not contain points from all groups in partitioning the data typically arises in scenarios where you're splitting your dataset into training and testing (or validation) sets and at least one of the splits (training, testing, or validation set) does not contain data points from all the groups or categories that are present in the original dataset.
This situation can lead to several issues, including:
  • Biased Model Training: The model may not learn to generalize well across all groups since it hasn't seen examples from each group during training.
  • Inaccurate Evaluation: The testing or validation set may not accurately represent the performance of the model across all groups if it lacks data from some of them.
The warning can be removed by cosidering the following possibilities and using the following techniques:
  1. Check for Small or Rare Groups: Look for any groups that have very few samples and consider merging them with similar groups or using oversampling techniques to increase their representation.
  2. If you're using stratified splitting, ensure that your stratification strategy accounts for the size and distribution of all groups.
  3. Implement custom logic for splitting the dataset that ensures all groups are represented in each split.
I hope it helps!

카테고리

Help CenterFile Exchange에서 Hypothesis Tests에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by