Matlab: Split and Plot the training data set and test data set.
이전 댓글 표시
Hi Everyone
Thanks for any Help
i need some solution for this practice.
First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set.
Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
--> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.

--> This example plot shows the MSE error for each degree of polynomial for both the training set and test set.
i have generated the data in one dimension using the following code :
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
댓글 수: 4
Adam Danz
2021년 5월 7일
Not sure what your question is. What part of the assignment are you stuck on? What specifically do you need help with?
arash Moha
2021년 5월 7일
Adam Danz
2021년 5월 7일
Do you have any data to work with? Surely your assignment wasn't to just make up data that looks like those curves.
arash Moha
2021년 5월 7일
답변 (1개)
Adam Danz
2021년 5월 7일
1 개 추천
I'll point you in the right direction toward the tools you need to complete your assignment.
>First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set. Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
I suggest using cvpartition to break up the data into training and testing set. It's not the most straightforward function but hopefully the documentation page and examples on that page will help to learn it.
Alternatively you use use randperm to create a vector of indicies that can be used to break up the data.
> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.
댓글 수: 22
arash Moha
2021년 5월 10일
Adam Danz
2021년 5월 10일
x is a 1x42 vector so size(x,1) returns 1. You either need size(x,2) or numel(x).
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Adam Danz
2021년 5월 11일
No, these two lines are very wrong,
Train=Data(:,1:round(j*0.70));
Test=Data(:,1:round(j*0.70)+1:end);
Here's the concept you need to understand,
idx = randperm(numel(x));
In the line above, idx contains values 1 to n where n is the number of values in x. These are indices and they are randomized without repeating.
You can use this vector to split up the data. For example, the first 30% (approximate) is data(idx(1:floor(n*.3))) where n is the number of values in x and the remaining 70% is data(idx(floor(n*.3)+1:n)).
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Adam Danz
2021년 5월 11일
If x is a vector, numel(x) is better to use than size(x,2).
I don't see where you're using the Train and Test data. I'm guessing that your instructor wants you to fit the training data and then using the coeficient estimates to compute the error between the training set and test set. So you'll need to compute the mean squared error of the residuals for each partition of data. That will give you 2 values for each polynomial degree.
The loop will look like this.
degrees = 1:100;
MSE = nan(numel(degrees),2); % 2 columns: one for training and one for testing set
for i = 1:numel(degrees)
% 1) Fit the data here using polyfit, the polynomial degree
% should change on each iteration.
% 2) Compute the MSE for the test set and training set
MSE(i,1) = ___
MSE(i,2) = ___
end
Now you'll have a 100x2 matrix of MSE. All you have to do is plot each column.
See your text book or other online resources to remind yourself how to compute MSE, if needed. You'll use the coefficient estimates (1st output in polyfit) and the actual y-values for both data sets.
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Adam Danz
2021년 5월 11일
In these 3 lines below, I assume temp is the x-values, x are the y-values, but I don't know what Data is supposed to be.
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(numel(x));
Then, in these 2 lines below you're generating even more random data so I'm really lost.
Train=Data(idx(1:floor(j*.3)+1:end));
y_Train=sin(Train + .2*randn(size(Train)));
You're not understanding the logic behind this. idx should be used to extract data from the x and y variables so they remain paired. For example, let's say your raw data are (x,y) coordinates stored in variables X and Y,
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = X(idxTrain);
yTrain = Y(idxTrain);
% Then repeate that process for test data.
This line needs to use the loop variable to change the polynomial degree. Right now you're always using 1! Also, make sure the first two inputs are all from the training data.
p=polyfit(Train,y_Train,1);
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Adam Danz
2021년 5월 11일
> how is the plot test error and train error to look like the following figure
With the command plot(x,y), x will be the degrees vector and y will be the vector of MSE values.
arash Moha
2021년 5월 11일
Your fitting the polynomial using both the training the test data sets (see section of your code below).
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,1); % Fit training data
pval=polyval(p,xTrain);
MSE(i,1) = mean((xTrain - pval).^2);
MSE(i,2) = mean((xTrain - pval).^2);
hold on % why is this here?
p=polyfit(xTest,yTest,1); % Fit testing data
pval=polyval(p,xTest);
MSE(i,1) = mean((xTest - pval).^2);
MSE(i,2) = mean((xTest - pval).^2);
end
I believe your assignment is to fit the data with the training set and to compute the MSE on both the training and test sets using the same coefficients returned by the training-fit. That's what cross validation is. If you fit the test set separately, that's not cross validation.
arash Moha
2021년 5월 12일
편집: arash Moha
2021년 5월 12일
arash Moha
2021년 5월 12일
편집: arash Moha
2021년 5월 12일
1. Is this the actual data you're supposed to be fitting?
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
plot(temp,x, '-o')
idxTrain = idx(1:floor(n*.7));
idxTest = idx(1:floor(n*.3));
3. This is not correct, although it's close. But it shows that you don't understand the concept of mean squared error, though. The error is the difference between the estimated y-values and the actual y-values.
MSE(i,1) = immse(pval,xTrain);
4. You're also need the mean squared error for the test-set using the same polynmial fit values you used with the training data (p with the x-test values).
Lastly here are what the firs 28 polynomial fits looklike when I fit the entire data (not the training/test sets). Note how the fits start to become better but the the fits start adapting too much and then at the end, the fits become really bad. The text label shows the loop-number.

The plot was created using
plot(temp, x, 'bo')
hold on
And then within the loop,
plot(temp, pval); % after fitting *all* of the data
arash Moha
2021년 5월 13일
편집: arash Moha
2021년 5월 13일
arash Moha
2021년 5월 13일
arash Moha
2021년 5월 13일
Adam Danz
2021년 5월 13일
Everything looks ok but you're not plotting the results correctly.
plot(Degrees,MSE(i,1), 'b.-', 'LineWidth', 3);
% ^
This is only plotting the last row of results. Instead, you want MSE(:,1) and then repeate for the second column.
The results will not look like the example in your image. Those lines must be from a different dataset.
arash Moha
2021년 5월 13일
Adam Danz
2021년 5월 13일
No, the plotting should stay out of the loop.
To plot a column z of matrix m, plot(m(:,z))
카테고리
도움말 센터 및 File Exchange에서 Support Vector Machine Regression에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


