이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
Matlab: Split and Plot the training data set and test data set.
조회 수: 22 (최근 30일)
이전 댓글 표시
arash Moha
2021년 5월 7일
Hi Everyone
Thanks for any Help
i need some solution for this practice.
First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set.
Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
--> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.
--> This example plot shows the MSE error for each degree of polynomial for both the training set and test set.
i have generated the data in one dimension using the following code :
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
댓글 수: 4
Adam Danz
2021년 5월 7일
Not sure what your question is. What part of the assignment are you stuck on? What specifically do you need help with?
arash Moha
2021년 5월 7일
i need output same as this picture but i am new in MATLAB and i have no idea about that.
this pic:
Adam Danz
2021년 5월 7일
Do you have any data to work with? Surely your assignment wasn't to just make up data that looks like those curves.
arash Moha
2021년 5월 7일
this picture just an example shape of the output I want.
i have generated the data in one dimension using the following code :
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
To generate data, only this code should be used for the given practice.
답변 (1개)
Adam Danz
2021년 5월 7일
I'll point you in the right direction toward the tools you need to complete your assignment.
>First, divide the data into two parts: training data (Train) and test data. Consider 30% of the data for the test set and 70% as the training set. Second, This segmentation should be completely random without duplicate data. In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate. (Use efficient functions in MATLAB for this purpose)
I suggest using cvpartition to break up the data into training and testing set. It's not the most straightforward function but hopefully the documentation page and examples on that page will help to learn it.
Alternatively you use use randperm to create a vector of indicies that can be used to break up the data.
> Perform the regression for the polynomial degree from degree 1 to degree 100 and display the results of these 100 experiments in the plot below.
댓글 수: 22
arash Moha
2021년 5월 10일
i use this code for split training and test data set but when i run the program , show me this error :
"" Error using cvpartition (line 160)
The number of observations must be a positive integer greater than one.
Error in Code (line 8)
c = cvpartition(size(x,1),'HoldOut',0.3); ""
what's the problem ??
this is my code:
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
c = cvpartition(size(x,1),'HoldOut',0.3);
idx= c.test;
Train= x(~idx,:);
Test= x(idx,:);
Adam Danz
2021년 5월 10일
x is a 1x42 vector so size(x,1) returns 1. You either need size(x,2) or numel(x).
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Is this the right way to split with this warning ->> This segmentation should be completely random without duplicate data ( without overlap ). In other words, none of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate.
If the code is incorrect, please send the correct form of code.
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(size(x));
[i,j]=size(Data);
idx=randperm(j);
Train=Data(:,1:round(j*0.70));
Test=Data(:,1:round(j*0.70)+1:end);
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
And what am i doing for this part that you said ?
According to the above code, How to plot TRAIN and TEST data set so that the x-axis is regression from degree 1 to 100 and the y-axis is proportional to the MSE value?
Please Help me and tell me the code for this section, thank you very much.
Adam Danz
2021년 5월 11일
No, these two lines are very wrong,
Train=Data(:,1:round(j*0.70));
Test=Data(:,1:round(j*0.70)+1:end);
Here's the concept you need to understand,
idx = randperm(numel(x));
In the line above, idx contains values 1 to n where n is the number of values in x. These are indices and they are randomized without repeating.
You can use this vector to split up the data. For example, the first 30% (approximate) is data(idx(1:floor(n*.3))) where n is the number of values in x and the remaining 70% is data(idx(floor(n*.3)+1:n)).
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Put j instead of n in your example? - Is this part of the code correct so far?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(size(x));
[i,j]=size(Data);
idx=randperm(numel(x));
Train=Data(idx(1:floor(j*.3)+1:j));
Test=Data(idx(1:floor(j*.3)));
p=polyfit(temp,x,1);
k=p(1)*temp+p(2);
Thanks a lot.
what am i doing for this part that you said ?
According to the above code, How to plot TRAIN and TEST data set so that the x-axis is regression from degree 1 to 100 and the y-axis is proportional to the MSE value?
Please Help me and tell me the code for this section, thank you very much.
Adam Danz
2021년 5월 11일
If x is a vector, numel(x) is better to use than size(x,2).
I don't see where you're using the Train and Test data. I'm guessing that your instructor wants you to fit the training data and then using the coeficient estimates to compute the error between the training set and test set. So you'll need to compute the mean squared error of the residuals for each partition of data. That will give you 2 values for each polynomial degree.
The loop will look like this.
degrees = 1:100;
MSE = nan(numel(degrees),2); % 2 columns: one for training and one for testing set
for i = 1:numel(degrees)
% 1) Fit the data here using polyfit, the polynomial degree
% should change on each iteration.
% 2) Compute the MSE for the test set and training set
MSE(i,1) = ___
MSE(i,2) = ___
end
Now you'll have a 100x2 matrix of MSE. All you have to do is plot each column.
See your text book or other online resources to remind yourself how to compute MSE, if needed. You'll use the coefficient estimates (1st output in polyfit) and the actual y-values for both data sets.
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
i'm so sorry,I am new to using MATLAB
Is this part of the code wrong for fit data with polyfit and compute mse ?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(numel(x));
[i,j]=size(Data);
idx=randperm(numel(x));
Train=Data(idx(1:floor(j*.3)+1:end));
y_Train=sin(Train + .2*randn(size(Train)));
Test=Data(idx(1:floor(j*.3)));
y_Test=sin(Test + .2*randn(size(Test)));
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(Train,y_Train,1);
pval=polyval(p,Train);
MSE(i,1) = mean((Train - pval).^2);
MSE(i,2) = mean((Train - pval).^2);
end
Adam Danz
2021년 5월 11일
In these 3 lines below, I assume temp is the x-values, x are the y-values, but I don't know what Data is supposed to be.
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
Data=randn(numel(x));
Then, in these 2 lines below you're generating even more random data so I'm really lost.
Train=Data(idx(1:floor(j*.3)+1:end));
y_Train=sin(Train + .2*randn(size(Train)));
You're not understanding the logic behind this. idx should be used to extract data from the x and y variables so they remain paired. For example, let's say your raw data are (x,y) coordinates stored in variables X and Y,
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = X(idxTrain);
yTrain = Y(idxTrain);
% Then repeate that process for test data.
This line needs to use the loop variable to change the polynomial degree. Right now you're always using 1! Also, make sure the first two inputs are all from the training data.
p=polyfit(Train,y_Train,1);
arash Moha
2021년 5월 11일
편집: arash Moha
2021년 5월 11일
Thank you very much for your effort. finally, considering that I have bothered you a lot, how is the plot test error and train error to look like the following figure, ie the x-axis is a regression from 1 to 100 and the y-axis is mse?
i think this is the correct code according to the explanation you kindly provided but instead of 1 in polyfit What should I put? I put i or Degrees, but it gives an error ?
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
idx=randperm(numel(x));
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = temp(idxTrain);
yTrain = x(idxTrain);
idxTest = idx(1:floor(n*.3));
xTest = temp(idxTest);
yTest = x(idxTest);
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,1);
pval=polyval(p,xTrain);
MSE(i,1) = mean((xTrain - pval).^2);
MSE(i,2) = mean((xTrain - pval).^2);
hold on
p=polyfit(xTest,yTest,1);
pval=polyval(p,xTest);
MSE(i,1) = mean((xTest - pval).^2);
MSE(i,2) = mean((xTest - pval).^2);
end
Adam Danz
2021년 5월 11일
> how is the plot test error and train error to look like the following figure
With the command plot(x,y), x will be the degrees vector and y will be the vector of MSE values.
arash Moha
2021년 5월 11일
Exactly, this state and shape is impossible.
Is the code above correct?
Thank you so much for your help.
Adam Danz
2021년 5월 11일
편집: Adam Danz
2021년 5월 11일
Your fitting the polynomial using both the training the test data sets (see section of your code below).
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,1); % Fit training data
pval=polyval(p,xTrain);
MSE(i,1) = mean((xTrain - pval).^2);
MSE(i,2) = mean((xTrain - pval).^2);
hold on % why is this here?
p=polyfit(xTest,yTest,1); % Fit testing data
pval=polyval(p,xTest);
MSE(i,1) = mean((xTest - pval).^2);
MSE(i,2) = mean((xTest - pval).^2);
end
I believe your assignment is to fit the data with the training set and to compute the MSE on both the training and test sets using the same coefficients returned by the training-fit. That's what cross validation is. If you fit the test set separately, that's not cross validation.
arash Moha
2021년 5월 12일
편집: arash Moha
2021년 5월 12일
Exactly you are right, 30% of the test set should be randomly separated and then fit to 70% data of the training set.
This is all the code I wrote, but I do not know exactly where the problem is? - I really thank you for telling me what the correct code is here and how MSE and Polyfit are calculated here, I'm so confused please help me for that.
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
idx=randperm(numel(x));
n = numel(x);
idxTrain = idx(1:floor(n*.7));
xTrain = temp(idxTrain);
yTrain = x(idxTrain);
idxTest = idx(1:floor(n*.3));
xTest = temp(idxTest);
yTest = x(idxTest);
Degrees = 1:100;
MSE = nan(numel(Degrees),2);
for i=1:numel(Degrees)
p=polyfit(xTrain,yTrain,i);
pval=polyval(p,xTrain);
MSE(i,1) = immse(pval,xTrain);
%MSE(i,2) = mean((idxTest - pval).^2);
end
plot(Degrees,MSE(i,1), 'b.-', 'LineWidth', 3);
hold on
%plot(Degrees,MSE(i,2), 'r.-', 'LineWidth', 3);
%title('x as a function of index', 'FontSize', 18);
xlabel('Regression', 'FontSize', 15);
ylabel('MSE', 'FontSize', 15);
grid on;
arash Moha
2021년 5월 12일
편집: arash Moha
2021년 5월 12일
please help me, i don't have any time to practice end, thanks a lot.
Adam Danz
2021년 5월 13일
1. Is this the actual data you're supposed to be fitting?
rng(914163506);
temp=0:.15:2*pi;
x = sin(temp)+.2*randn(size(temp));
plot(temp,x, '-o')
idxTrain = idx(1:floor(n*.7));
idxTest = idx(1:floor(n*.3));
3. This is not correct, although it's close. But it shows that you don't understand the concept of mean squared error, though. The error is the difference between the estimated y-values and the actual y-values.
MSE(i,1) = immse(pval,xTrain);
4. You're also need the mean squared error for the test-set using the same polynmial fit values you used with the training data (p with the x-test values).
Lastly here are what the firs 28 polynomial fits looklike when I fit the entire data (not the training/test sets). Note how the fits start to become better but the the fits start adapting too much and then at the end, the fits become really bad. The text label shows the loop-number.
The plot was created using
plot(temp, x, 'bo')
hold on
And then within the loop,
plot(temp, pval); % after fitting *all* of the data
arash Moha
2021년 5월 13일
편집: arash Moha
2021년 5월 13일
- No No, i have generated the data in one dimension,you should just plot x without temp, x is my actual data.
- what is the correct expression of this code? - Although you mentioned in a few previous messages that it should be written like this.
- what is the correct expression of this code?
- how i fit the test set using the same polynominal fit ? - you mean this : pvall = polyval(p,xTest);
Note : Data Generated is in the one dimension, x is just my actual data without temp.
Note : None of the randomly selected test set data should be present in the training set and also the data in the test set should not be duplicate
arash Moha
2021년 5월 13일
Note : In this exercise, we intend to examine the effect of polynomial degrees on regression and overfit on training samples.
arash Moha
2021년 5월 13일
Thank you very much for your time. If you please send me the changes to correct the code I sent, I do not have much time to deliver the exercise, please help.I beg you.
Adam Danz
2021년 5월 13일
Everything looks ok but you're not plotting the results correctly.
plot(Degrees,MSE(i,1), 'b.-', 'LineWidth', 3);
% ^
This is only plotting the last row of results. Instead, you want MSE(:,1) and then repeate for the second column.
The results will not look like the example in your image. Those lines must be from a different dataset.
arash Moha
2021년 5월 13일
Is this code correct? - Please, if there is a problem somewhere in the code, please tell me the correct code, thank you very much.
clc;
close all;
clear;
workspace;
rng(914163506);
temp=0:.15:2*pi;
Data = sin(temp)+.2*randn(size(temp));
[i,j]=size(Data);
p=0.30;
idx=randperm(j);
Data_Trainset=Data(:,idx(1:round(p*j)));
Data_Testset=Data(:,idx(round(p*j)+1:end));
Data_Trainset_y=randn(size(Data_Trainset));
Data_Testset_y=randn(size(Data_Testset));
l1=[];
l2=[];
for i1=1:100
p=polyfit(Data_Trainset,Data_Trainset_y,i1);
pval= polyval(p,Data_Trainset_y);
pvall=polyval(p,Data_Testset_y);
MSE_Trainset=immse(pval,Data_Trainset);
MSE_Testset=immse(pvall,Data_Testset);
g=inv(MSE_Trainset);
g1=inv(MSE_Testset);
l1=[l1,g];
l2=[l2,g1];
plot(i1,g,'b.-', 'LineWidth', 3);
legend('Train error','Test error');
hold on
plot(i1,g1,'r.-', 'LineWidth', 3);
legend('Train error','Test error');
hold on
end
xlabel('Regression', 'FontSize', 15);
ylabel('MSE', 'FontSize', 15);
grid on;
Adam Danz
2021년 5월 13일
No, the plotting should stay out of the loop.
To plot a column z of matrix m, plot(m(:,z))
참고 항목
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
아시아 태평양
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)