predictive modelling and normc

조회 수: 2 (최근 30일)
Paul
Paul 2015년 6월 13일
댓글: Star Strider 2015년 6월 15일
I'm working on some predictive modelling problems using the knn function. I have a concern that my use of normc is introducing future state information into the model which is obviously wrong. I'll try my best to explain the problem. For simplicity of explaination I will reduce it down to smaller numbers.
My training inputs to the model are 3 days worth of data, each day has 2 variables.
inputs =
1 2
3 4
8 9
I use normc on this, because in reality the inputs vary quite drastically.
ans =
0.1162 0.1990
0.3487 0.3980
0.9300 0.8955
The model is then created and will be used to classify future days. The next day of data is
inputs2 =
7 3
This data as is, and correct me if I'm wrong is no good to the model because it hasn't been transformed yet. Yet using normc on this just creates ones. So I add the 4th day data to the first 3 days, and use normc and I get the below.
ans =
1 2
3 4
8 9
7 3
normc =
0.0902 0.1907
0.2705 0.3814
0.7213 0.8581
0.6312 0.2860
As you can see, the data has now all changed. Logically this is supposed to happen with normc but that means the first day inputs have been modified due to the 2nd and 3rd day inputs. This a fundamental no in predictive modelling. ie the first day data is used to predict the 2nd day output, but the first day data already knows about the 2nd day data which is impossible in real life.
So how are you suppose to transform your data without causing this conundrum?
  댓글 수: 2
Star Strider
Star Strider 2015년 6월 13일
You didn’t post your code, but your Question suggests that you may be a bit confused.
The ‘knn’ (k-th nearest neighbour) functions are for classification. They have nothing to do with neural networks, so will classify your data, but not predict anything.
The normc function normalises the columns of your data so that they have a length (in the Euclidean sense) equaling 1.
Paul
Paul 2015년 6월 14일
I didn't mention anything about neural networks. But I don't see how knn cannot be used for prediction? Below I have my code. Lets says Train_Input = day 1 to day 10 of data, and Train_Target, is day 1 to day 10 of the class. Then Test_Input is day 11 to day 15, this will classify day 11 to day 15 with classes. Hence it's predicting day 11 to 15. If I'm missing something here please let me know?
predictions=knnclassify(Test_Input, Train_Input, Train_Target,10,'euclidean','nearest');

댓글을 달려면 로그인하십시오.

답변 (1개)

Star Strider
Star Strider 2015년 6월 14일
The knn classifier cannot predict because it is a classifier!
It is using the training input (known classes) to classify subsequent data. In the documentation for ‘Train a k-Nearest Neighbor Classifier Using a Custom Distance Metric’, the code refers to the measured prototype data as ‘Predictors’, so that may be the confusion. Those are the prototype vectors of known classes that the routine uses to classify subsequent unknown data vectors. I’ve never previously seen them referred to as ‘predictors’ though. They’re simply feature vectors of known classes.
The knn classifier uses its known feature vectors (and associated known classes) to classify new unknown data into the classes the classifier has to compare them with.
In the sense that ‘prediction’ forecasts the next element in a sequence, for instance, the knn classifier does not predict anything.
  댓글 수: 2
Paul
Paul 2015년 6월 15일
Ok so we're getting tied up in wording here, in my head 'classify subsequent unknown data vectors' was the same as making a prediction. Regardless of this my original question still stands, I am using knn to classify a days data into a group and have concerns with the way I'm using the normc function to prep the data.
Star Strider
Star Strider 2015년 6월 15일
I have concerns with using the normc function to prep the data for a knn classifier as well. Since it normalises them so that they sum to one in the Euclidean sense, it would also distort them. Consider:
x = [0.1; 0.3; 0.6];
ncx = normc(x);
disp(' x normc(x)')
disp([x ncx])
x normc(x)
0.1 0.14744
0.3 0.44233
0.6 0.88465
There should be no reason to ‘prep’ the data for a classifier, since the data themselves should be in the range that they would correspond to one of the previously-defined classes. (The normc function is part of the Neural Network Toolbox. Some neural net designs require such normalised data because of the way the weights are calculated. The knn classifier does not.)
If you want to normalise the individual training vectors so that they sum to one without distorting them, simply divide each element of the vector by the sum of the elements of the vector. That is not necessary for a knn classifier, and I do not recommend it, but if want to do it, go ahead.
Also, with your data as you presented them, I would normalise the rows rather than the columns. You are entering them as individual rows, so that makes more sense than normalising the columns (that for each data vector would produce a vector of ones, losing all the information).

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Classification에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by