predictive modelling and normc
조회 수: 2 (최근 30일)
이전 댓글 표시
I'm working on some predictive modelling problems using the knn function. I have a concern that my use of normc is introducing future state information into the model which is obviously wrong. I'll try my best to explain the problem. For simplicity of explaination I will reduce it down to smaller numbers.
My training inputs to the model are 3 days worth of data, each day has 2 variables.
inputs =
1 2
3 4
8 9
I use normc on this, because in reality the inputs vary quite drastically.
ans =
0.1162 0.1990
0.3487 0.3980
0.9300 0.8955
The model is then created and will be used to classify future days. The next day of data is
inputs2 =
7 3
This data as is, and correct me if I'm wrong is no good to the model because it hasn't been transformed yet. Yet using normc on this just creates ones. So I add the 4th day data to the first 3 days, and use normc and I get the below.
ans =
1 2
3 4
8 9
7 3
normc =
0.0902 0.1907
0.2705 0.3814
0.7213 0.8581
0.6312 0.2860
As you can see, the data has now all changed. Logically this is supposed to happen with normc but that means the first day inputs have been modified due to the 2nd and 3rd day inputs. This a fundamental no in predictive modelling. ie the first day data is used to predict the 2nd day output, but the first day data already knows about the 2nd day data which is impossible in real life.
So how are you suppose to transform your data without causing this conundrum?
댓글 수: 2
Star Strider
2015년 6월 13일
You didn’t post your code, but your Question suggests that you may be a bit confused.
The ‘knn’ (k-th nearest neighbour) functions are for classification. They have nothing to do with neural networks, so will classify your data, but not predict anything.
The normc function normalises the columns of your data so that they have a length (in the Euclidean sense) equaling 1.
답변 (1개)
Star Strider
2015년 6월 14일
The knn classifier cannot predict because it is a classifier!
It is using the training input (known classes) to classify subsequent data. In the documentation for ‘Train a k-Nearest Neighbor Classifier Using a Custom Distance Metric’, the code refers to the measured prototype data as ‘Predictors’, so that may be the confusion. Those are the prototype vectors of known classes that the routine uses to classify subsequent unknown data vectors. I’ve never previously seen them referred to as ‘predictors’ though. They’re simply feature vectors of known classes.
The knn classifier uses its known feature vectors (and associated known classes) to classify new unknown data into the classes the classifier has to compare them with.
In the sense that ‘prediction’ forecasts the next element in a sequence, for instance, the knn classifier does not predict anything.
댓글 수: 2
Star Strider
2015년 6월 15일
I have concerns with using the normc function to prep the data for a knn classifier as well. Since it normalises them so that they sum to one in the Euclidean sense, it would also distort them. Consider:
x = [0.1; 0.3; 0.6];
ncx = normc(x);
disp(' x normc(x)')
disp([x ncx])
x normc(x)
0.1 0.14744
0.3 0.44233
0.6 0.88465
There should be no reason to ‘prep’ the data for a classifier, since the data themselves should be in the range that they would correspond to one of the previously-defined classes. (The normc function is part of the Neural Network Toolbox. Some neural net designs require such normalised data because of the way the weights are calculated. The knn classifier does not.)
If you want to normalise the individual training vectors so that they sum to one without distorting them, simply divide each element of the vector by the sum of the elements of the vector. That is not necessary for a knn classifier, and I do not recommend it, but if want to do it, go ahead.
Also, with your data as you presented them, I would normalise the rows rather than the columns. You are entering them as individual rows, so that makes more sense than normalising the columns (that for each data vector would produce a vector of ones, losing all the information).
참고 항목
카테고리
Help Center 및 File Exchange에서 Classification에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!