splitting dataset into training and testing

조회 수: 7 (최근 30일)
Salma fathi
Salma fathi 2022년 8월 9일
답변: Manas Shivakumar 2022년 8월 9일
Hello,
We are trying to split some data into training and testing datasets, 80% to 20% respectively, and we are having the folowing issue:
  • if the splitiing is done using a method such as "cvpartition" or any other similar method it would split the data randomly point by point, whereas our data is more like a time series data and its preferable to keep the data related to a certain date and time all together, not splitted between training and testing.
we thought about giving the rows that are related to a certain date and time a specific number as an index or a label and then split these numbers randomly, but this will not garantee that the the training to tesing is 80 to 20 as not all the data for a certain dates have the same size
I attach a sample of the data above if anyone can help,
Thanks in advance.

답변 (1개)

Manas Shivakumar
Manas Shivakumar 2022년 8월 9일
There are a couple of functions to split the testing dataset. They include :
  • cvpartition
  • crossvalind
Since you don't know the size of your parition beforehand, I suggest you come up with a labelling system first that pairs these dates based on some criteria. you can then precisely separate them by getsamples(timeseries,ind). Coming to the size of distribution you could simply try out all possible combinations with different number of merges and pick the one that is the closest towards your needed split.

카테고리

Help CenterFile Exchange에서 Holidays / Seasons에 대해 자세히 알아보기

제품


릴리스

R2021b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by