Specify Indices for Training, Validation and Testing

Question

0 개 추천

Hello,

I am trying to evalutate a standard set of regression and classification problem as part of a team at work. My normal workflow is:

Use the CVPARTITION function to automatically create training and validation dataset partitions.
Train my model (e.g. via fitrensemble, fitcensemble, etc.)
Use trained model on a "test" data sets to evalate predictability of the trained model.

This is hopefully a pretty standard use case and work flow. Here is my problem:

I want to be able manually specify the data partition indices for the "training" and "validation" data split myself. It's like using a holdout validation, except instead of letting Matlab randomly sample the indices, I would manually specify which points to use for training and which to use for validation. Unfortuantely I can't seem to figure out a way to just provide these partitions myself without having Matlab control the random sampling.

For perspective: when using a shallow neural network, we can do this by simply using the 'divideind' option and then providing the vector of indices for each split directly.

net.divideFcn = 'divideind';
net.divideParam.trainInd=trainIND;
net.divideParam.valInd=valInd;
net.divideParam.testInd=testInd;

This is somewhat important to my problem because I am working with a team of other scientists who use different software packages and approaches. For consistency and to allow comparison between models, we like to use the same data sets for training, validation and testing. However, I am not able to currently use the same validation splits in Matlab.

Am I missing a function or option in Matlab to do this? Any suggestions are welcome!

Thank you.

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

Poonpat Poonnoy 2020년 5월 19일

Hi there,

I might have a solution which I used in my work.

First, i prepared the data for training (2000 datasets) , validating (500 datasets) and testing (500 datasets) in separated matrix. I noted the sizes each matrix as they are used to calculate the begining and end points of each groups.

Second, i concenated those matrix to a new matrix e.g. "p" which the number of column should be the summation of the samples in each group. Please beware of column data/row data type

Thrid, use [trainInd,valInd,testInd] = divideind(p,1:2000,2001:2500,2501:3000);

in this case the "train data" will be indexed from p matrix column 1 to 2000

"val data"will be indexed from p matrix column 2001 to 2500 and the rest is for the test.

I hope this help

Ross 2020년 11월 6일

Seth, did you find a solution. My take on Poonpat's response is that yes it spits the data, but there does not seem to be a way to feed that into fitrensemble, obviously you can seperate off the validation dataset and use the genrated model.

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Drew 2024년 10월 2일

2 개 추천

Custom partitions were introduced to cvpartition in R2023b. See https://www.mathworks.com/help/stats/cvpartition.html, and in particular the new cvpartition creation syntax 'c=cvpartition("CustomPartition",testSets)' . With this functionality, users can create cvparition objects that meet their partition requirements, such as for keeping certain observations grouped together because they come from the same underlying physical sample.

In addition, it is always possible to control the datasets completely by calling training, validation, and test functions explicitly with the desired data sets.

If this answer helps you, please remember to accept the answer.

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

Specify Indices for Training, Validation and Testing

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

답변 (1개)

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

Community Treasure Hunt

Specify Indices for Training, Validation and Testing

댓글 수: 4 이전 댓글 2개 표시 이전 댓글 2개 숨기기

답변 (1개)

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

카테고리

제품

릴리스

태그

참고 항목

Community Treasure Hunt

댓글 수: 4
이전 댓글 2개 표시 이전 댓글 2개 숨기기

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기