Extracting testing and training data from a single dataset

조회 수: 10 (최근 30일)
Rahul Gulia
Rahul Gulia 2022년 10월 28일
답변: Rahul Gulia 2022년 10월 31일
I have a dataset of size 14400 x 14, where the first 2 columns represent a users x- and y- position, and ranges from 1 : 121.
Example:
first_col second_col . . . . . .
1 1
1 2
1 3
so on to 121
2 1
2 2
so on to 121
3 so on to 121
. .
so on to 121 so on to 121
I want to separate the testing data based on the user location ranging from first_col(1:30) and 2nd column(1:30).
I a using for loop, but it is taking a lot of time.
I would really appreciate any kind of suggestions on this issue.
Thank You
  댓글 수: 2
Rahul Gulia
Rahul Gulia 2022년 10월 28일
편집: Rahul Gulia 2022년 10월 28일
I also want to to be able to separate the dataset for training and testing purpose. And then later combine both the datasets into one for further use.
I guess we can use the index values for this one.
Khushboo
Khushboo 2022년 10월 31일
Hi Rahul,
I am sorry I did not fully understand how you want your test data to look like. Could you kindly elaborate more using an example? From what I assume, using slicing would work for your use case.

댓글을 달려면 로그인하십시오.

채택된 답변

Rahul Gulia
Rahul Gulia 2022년 10월 31일
I was able to solve this issue of mine. It was a simple example to join 2 matrices according to the 1st column values of both the matrices.
Example code:
**************************************************************
xx = [1 7 8; 4 9 10; 5 11 12];
yy = [2 13 14; 3 15 16; 6 17 18];
zz = [xx; yy]
ww = [];
for pp = 1:length(zz)
for qq = 1:length(zz)
if pp==zz(qq,1)
ww = [ww; zz(qq,:)];
end
end
end
ww
*****************************************************************

추가 답변 (2개)

Rajeev
Rajeev 2022년 10월 31일
Hi Rahul,
Logical Indexing can be used to extract the required data from the array.
Assuming that the name of the matrix is "location", to extract only the user locations ranging from 1 to 30, one can proceed in the following way:
% logical indexing is used to extract the index of the required data from each column
first_col_index = first_col <= 130;
second_col_index = second_col <=130;
% logical & (and) operations gives the index of columns where both coordinates are less than or equal to 130
location_index = first_col_index & second_col_index;
% assuming the matrix "location" is a row matrix, the logical index array can be used to extract the required data
location_new = location(location_index,:);
Here is the documentation for logical indexing: Matrix Indexing in MATLAB - MATLAB & Simulink (mathworks.com)

Rahul Gulia
Rahul Gulia 2022년 10월 31일
I figured out a way to create the training and testing data based on the location of the users. Here is how I did it.
My DatasetTmp_14 looks like this. (Note: the first column contains the index terms of each row)
1 0 0.5 40.36 43.05 0 1 60 0 54.5 0.5 1 15 5 2301
2 0 1 40.02 42.74 0 1 60 0 54 1 1 15 5 2336
3 0 1.5 39.69 42.43 0 1 60 0 53.5 1.5 1 15 5 2311
4 0 2 39.37 42.13 0 1 60 0 53 2 1 15 5 2327
5 0 2.5 39.05 41.83 0 1 60 0 52.5 2.5 1 15 5 2318
DatasetTmp_14 size = 13310x15.
Now,
*****************************************************
idx1 = (1:length(DatasetTmp_13))';
DatasetTmp_14 = [idx1 DatasetTmp_13];
quadrant_data_test = [];
quadrant_data_train = [];
for pp = 1:length(DatasetTmp_14) % Takes too long to execute
if (DatasetTmp_14(pp,2)<=30 && DatasetTmp_14(pp,3)<=27.5)
tmp1 = DatasetTmp_14(pp,1:15);
quadrant_data_test = [quadrant_data_test; tmp1];
else
quadrant_data_train = [quadrant_data_train; DatasetTmp_14(pp,1:15)];
end
end
*****************************************************
Now I would like to combine the two datasets based on their index values, which I executed like this. This is where I am stuck right now. Kindly let me know of any suggestion on my code, as the new matrix is not created according to proper sequence.
*****************************************************
test_heatmap_data_tmp = [quadrant_data_test; quadrant_data_train];
recreated_dataset = [];
for pp = 1:length(test_heatmap_data_tmp)
for qq = 1:length(test_heatmap_data_tmp)
if (pp == test_heatmap_data_tmp(qq,1))
tmp = test_heatmap_data_tmp(pp,:);
recreated_dataset = [recreated_dataset; tmp];
end
end
end
*****************************************************
This is how the recreated and original image should look like for better reference.

카테고리

Help CenterFile Exchange에서 Matrix Indexing에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by