predictive modelling and normc

Question

Paul 2015년 6월 13일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/223667-predictive-modelling-and-normc

댓글: Star Strider 2015년 6월 15일

I'm working on some predictive modelling problems using the knn function. I have a concern that my use of normc is introducing future state information into the model which is obviously wrong. I'll try my best to explain the problem. For simplicity of explaination I will reduce it down to smaller numbers.

My training inputs to the model are 3 days worth of data, each day has 2 variables.

inputs =
   2
   4
   9

I use normc on this, because in reality the inputs vary quite drastically.

ans =
1162    0.1990
3487    0.3980
9300    0.8955

The model is then created and will be used to classify future days. The next day of data is

inputs2 =
     7     3

This data as is, and correct me if I'm wrong is no good to the model because it hasn't been transformed yet. Yet using normc on this just creates ones. So I add the 4th day data to the first 3 days, and use normc and I get the below.

ans =
       1     2
       3     4
       8     9
       7     3
normc =
    0.0902    0.1907
    0.2705    0.3814
    0.7213    0.8581
    0.6312    0.2860

As you can see, the data has now all changed. Logically this is supposed to happen with normc but that means the first day inputs have been modified due to the 2nd and 3rd day inputs. This a fundamental no in predictive modelling. ie the first day data is used to predict the 2nd day output, but the first day data already knows about the 2nd day data which is impossible in real life.

So how are you suppose to transform your data without causing this conundrum?

댓글 수: 2
없음 표시없음 숨기기

Star Strider 2015년 6월 13일

You didn’t post your code, but your Question suggests that you may be a bit confused.

The ‘knn’ (k-th nearest neighbour) functions are for classification. They have nothing to do with neural networks, so will classify your data, but not predict anything.

The normc function normalises the columns of your data so that they have a length (in the Euclidean sense) equaling 1.

Paul 2015년 6월 14일

MATLAB Online에서 열기

I didn't mention anything about neural networks. But I don't see how knn cannot be used for prediction? Below I have my code. Lets says Train_Input = day 1 to day 10 of data, and Train_Target, is day 1 to day 10 of the class. Then Test_Input is day 11 to day 15, this will classify day 11 to day 15 with classes. Hence it's predicting day 11 to 15. If I'm missing something here please let me know?

predictions=knnclassify(Test_Input, Train_Input, Train_Target,10,'euclidean','nearest');

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Star Strider 2015년 6월 14일

0
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/223667-predictive-modelling-and-normc#answer_182632

The knn classifier cannot predict because it is a classifier!

It is using the training input (known classes) to classify subsequent data. In the documentation for ‘Train a k-Nearest Neighbor Classifier Using a Custom Distance Metric’, the code refers to the measured prototype data as ‘Predictors’, so that may be the confusion. Those are the prototype vectors of known classes that the routine uses to classify subsequent unknown data vectors. I’ve never previously seen them referred to as ‘predictors’ though. They’re simply feature vectors of known classes.

The knn classifier uses its known feature vectors (and associated known classes) to classify new unknown data into the classes the classifier has to compare them with.

In the sense that ‘prediction’ forecasts the next element in a sequence, for instance, the knn classifier does not predict anything.

댓글 수: 2
없음 표시없음 숨기기

Paul 2015년 6월 15일

Ok so we're getting tied up in wording here, in my head 'classify subsequent unknown data vectors' was the same as making a prediction. Regardless of this my original question still stands, I am using knn to classify a days data into a group and have concerns with the way I'm using the normc function to prep the data.

Star Strider 2015년 6월 15일

MATLAB Online에서 열기

I have concerns with using the normc function to prep the data for a knn classifier as well. Since it normalises them so that they sum to one in the Euclidean sense, it would also distort them. Consider:

x = [0.1; 0.3; 0.6];
ncx = normc(x);
disp('           x       normc(x)')
disp([x ncx])
             x       normc(x)
            0.1      0.14744
            0.3      0.44233
            0.6      0.88465

There should be no reason to ‘prep’ the data for a classifier, since the data themselves should be in the range that they would correspond to one of the previously-defined classes. (The normc function is part of the Neural Network Toolbox. Some neural net designs require such normalised data because of the way the weights are calculated. The knn classifier does not.)

If you want to normalise the individual training vectors so that they sum to one without distorting them, simply divide each element of the vector by the sum of the elements of the vector. That is not necessary for a knn classifier, and I do not recommend it, but if want to do it, go ahead.

Also, with your data as you presented them, I would normalise the rows rather than the columns. You are entering them as individual rows, so that makes more sense than normalising the columns (that for each data vector would produce a vector of ones, losing all the information).

댓글을 달려면 로그인하십시오.

predictive modelling and normc

댓글 수: 2
없음 표시없음 숨기기

답변 (1개)

댓글 수: 2
없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

predictive modelling and normc

댓글 수: 2 없음 표시없음 숨기기

답변 (1개)

댓글 수: 2 없음 표시없음 숨기기

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기