what are the different between test data and training data??

Question

0 개 추천

what are different between test data and training data

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Follow Question

Answer 1

Thomas Koelen 2015년 5월 12일

1 개 추천

In a dataset a training set is implemented to build up a model, while a test (or validation) set is to validate the model built. Data points in the training set are excluded from the test (validation) set. Usually a dataset is divided into a training set, a validation set (some people use 'test set' instead) in each iteration, or divided into a training set, a validation set and a test set in each iteration.

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

ALAMGIR SARDAR 2019년 8월 27일

Thanks

댓글을 달려면 로그인하십시오.

Answer 2

Walter Roberson 2015년 5월 12일

1 개 추천

To expand on this a small bit:

You run calculations on the training set to determine various coefficients.

You can then use the testing set to check how well the predictions do on a wider set of data, and that gives you information about false positives and false negatives.

You can use those accuracy figures to go back and re-train. You do not need to use the same division of training and test data each time: there is a common technique called "leave one out" where you deliberately drop one item at a time from the training set and re-calculate, in case that one was an outlier that was preventing getting a good overall result.

There is a nasty problem in doing classification called 'Overtraining": the calculations might fit the data you have on hand extremely well but be useless for anything else. Dividing into training and testing reduces this risk: if the algorithm has not seen a bunch of data in its calculations then it is not going to adjust itself to be exactly right for that data and bad for other things. Using all of your data to train with is therefor not a good idea.

After the program has gone back and forth on training sets and validation sets, and has decided on the best coefficients, where the data was allowed to affect the algorithm, then it is time to run it on the remaining data and produce a report. The rest of the data might not have a known classification, but it might. If the classifications are known then when the programmer looks at the report the programmer might decide it is time to change the program. Or might not. The report is the kind of thing that gets written up in a paper: we did this and that and with a limited subset of data to train and test with, we did this well on real data. Or perhaps you send it to the people designing the equipment and experiments so they can see what needs to be improved on their end. Eventually you publish the paper or write a report or the like, and other people read it and want to use your program too. But they aren't going to do that if you haven't established evidence that it is not over-training on the particular data you gave it -- and seeing how well it did on data that was not used to design the details of the algorithm is evidence.

댓글 수: 2
없음 표시 없음 숨기기

Isabel Hostettler 2017년 2월 15일

I've just read your answer, can I ask for advice/help or ask a question? I've come across the sentence: "quality of prediction was estimated to be good if the difference between the training and test dataset was <5 and acceptable if it was <10%". Now my question is, how did the person choose this difference to be good or acceptable, respectively? Is that the difference on always takes or is there a rule? A reference to relate to? Advice would be much appreciated. Isabel

NN 2021년 3월 9일

about leave one out part, how is it done ?is it by leaving one data point and taking the rest again as test data ?

댓글을 달려면 로그인하십시오.

what are the different between test data and training data??

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시 없음 숨기기

카테고리

태그

Community Treasure Hunt

what are the different between test data and training data??

댓글 수: 0 이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

답변 (2개)

댓글 수: 1 이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2 없음 표시 없음 숨기기

카테고리

태그

참고 항목

Community Treasure Hunt

댓글 수: 0
이전 댓글 -2개 표시 이전 댓글 -2개 숨기기

댓글 수: 1
이전 댓글 -1개 표시 이전 댓글 -1개 숨기기

댓글 수: 2
없음 표시 없음 숨기기