How to split a sequence based on values from one variable

조회 수: 8 (최근 30일)
Matteo Soldini
Matteo Soldini 2020년 5월 3일
댓글: Ameer Hamza 2020년 5월 4일
Good evening,
I can't figure out how to solve the following problem.
Assuming that I have a dataset as in the picture, I would like to divide it into many smaller datasets using the variable "State" and keeping the sequence. Actually the real dataset has more than 200000 observations so I can't know when the variable State changes from NORMAL to RECOVERY and vice versa, but I would like to split the dataset into many mini sequences where each one has the same State variable for all the observations.
Then, I would need to divide the variables into a Predictors set (varaibles Sensor 1, Sensor 2, Sensor 3) and a Response set (variable State).
If we take, as an example, the image, at the end of the problem I would like to have for the Predictors a cell array of size Nx1 (N equal to the number of mini sequences) with the first cell of size 3x2 (the three features and the first two observations), the second cell of size 3x2, the third cell of size 3x1 and so on. Correspondingly, for the Response I would like to have an Nx1 cell array where the first cell is of dimension 1x2, the second is 1x2, the third is 1x1 and so on.
The problem is that with a dataset of 200000 observations I don't know what kind of loop to use and how to use it.
Thank you!

채택된 답변

Ameer Hamza
Ameer Hamza 2020년 5월 3일
See the following example.
First create an example table
data = {1, 2, 3, 'norm'; 2, 3, 4, 'norm';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec';
1, 2, 3, 'norm'; 2, 3, 4, 'rec';
2, 3, 1, 'rec' ; 4, 4, 2, 'rec'};
t = cell2table(data, 'VariableNames', ...
{'sen1', 'sen2', 'sen3', 'state'}); % an example table
Result
t =
8×4 table
sen1 sen2 sen3 state
____ ____ ____ ________
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'norm'}
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
1.00 2.00 3.00 {'norm'}
2.00 3.00 4.00 {'rec' }
2.00 3.00 1.00 {'rec' }
4.00 4.00 2.00 {'rec' }
Then run the following code to split the data
idx = findgroups(t.state);
partition_idx = [1; find(diff(idx)~=0)+1; size(data,1)];
partition_idx = discretize(1:size(data,1), partition_idx);
sensor_val = splitapply(@(x) {x}, table2cell(t(:,1:3)), partition_idx.');
state_val = splitapply(@(x) {x}, table2cell(t(:,4)), partition_idx.');
sensor_val and sensor_val are cell arrays containing the required values.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Linear Regression에 대해 자세히 알아보기

제품


릴리스

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by