Random Test Dataset With More Values Of One Type

조회 수: 1 (최근 30일)
Harel Harel Shattenstein
Harel Harel Shattenstein 2019년 5월 22일
답변: the cyclist 2019년 5월 22일
I have a set with y values of 0 or 1.
most of my data has 0 value, so when splitting the data to train and test 80 -20 respectively, most of the test data has 0 y value.
I want that the randomly chosen test data will have more data with 1 has y vlaues, how can it be done?

답변 (1개)

the cyclist
the cyclist 2019년 5월 22일
One simple approach would be:
  1. split your dataset into the y=0 set and the y = 1 set
  2. do the 80/20 training/test split on the y=0 and y=1 separately
  3. combine the two training sets and the two test sets
Bear in mind that highly imbalanced datasets like this have pitfalls for analysis. It's too much to describe here, but if you search keywords like imbalanced dataset machine learning, you'll be able to read about the problems, and some potential solutions.

카테고리

Help CenterFile Exchange에서 Classification에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by