Help reorganizing data in a table

조회 수: 5 (최근 30일)
ANDREW Feenan
ANDREW Feenan 2021년 8월 5일
편집: Konrad 2021년 8월 6일
Hi,
I have an excel sheet with lots of data based on recordings and measurements. These recordings and measurements are displayed in the excel sheet in a certain way. I need to reorganise this data and display it in another way. I have attached an example excel sheet with two days worth of data. The attached excel sheet also shows how I need the data to be displayed.
I have been trying to use MATLAB for nearly a week to try to get this working but I don't think I am any closer to figuring it out. I am new to MATLAB and don't have much experience using it for something like this.
I would really appreciate it if someone could explain to me in simple terms how to do this. I have lots of data to work with so doing this manually is not an option.
Andrew
  댓글 수: 2
the cyclist
the cyclist 2021년 8월 5일
@Konrad beat me to posting the elegant unstack solution. I'll just comment here that in general these types of operations are known as "pivoting" the table. You'll see various terminology out there (when considering other languages, such as R and python) such as making the table "long" or "wide", etc.
One other comment. Are you really sure that you want the new format? The reason I ask is that the original seems to me to be closer to the "tidy" format (see, e.g. this article) that is typically best for analysis. But I have to admit I did not think deeply about this yet. Maybe the new format is actually tidier.
Regardless, you may find the article informative. I like to evangelize about tidy data, as I find it to be a powerful concept.
Peter Perkins
Peter Perkins 2021년 8월 5일
Andrew, this is essentially the same question you asked earlier: unstacking multiple variables. I recommend you look at the answer there.

댓글을 달려면 로그인하십시오.

채택된 답변

Cris LaPierre
Cris LaPierre 2021년 8월 5일
I like Konrad's approach. You can easily reorder the results to be what you want, and it's easy to understand. I went down a rabbit hole trying to get the table to be closer to what you showed. I share it only because I already have the working code. It splits the data into separate tables, then joins them together.
% Load the data
opts = detectImportOptions("example_sheet.xlsx");
opts = setvartype(opts,["Name","Day","Category"],"categorical");
opts = setvaropts(opts,"Name","Ordinal",true);
data = readtable('example_sheet.xlsx',opts);
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
data = rmmissing(data,'DataVariables',"Name");
data.Name = reordercats(data.Name,"Person "+(1:14));
% Split data into subtables by activity/measurement
a1M1 = data(ismember(data.Category,"Activity 1") & ~isnan(data.Measurement1),[1:3 5:7]);
a1M1.Properties.VariableNames(4:6) = "Activity 1: "+a1M1.Properties.VariableNames(4:6);
a1M2 = data(ismember(data.Category,"Activity 1") & ~isnan(data.Measurement2),[1:3 5:6 8]);
a1M2.Properties.VariableNames(4:6) = "Activity 1: "+a1M2.Properties.VariableNames(4:6);
a2 = data(ismember(data.Category,"Activity 2"),[1:3 5:6]);
a2.Properties.VariableNames(4:5) = "Activity 2: "+a2.Properties.VariableNames(4:5);
% Join tables back into a single table
joinAct1 = outerjoin(a1M1,a1M2,"Keys",["Name","Day","Date"],...
"MergeKeys",true);
joinAct2 = outerjoin(joinAct1,a2,"Keys",["Name","Day","Date"],...
"MergeKeys",true);
joinedData = sortrows(joinAct2,["Date","Name"])
joinedData = 28×11 table
Name Day Date Activity 1: Duration_a1M1 Activity 1: Rating_a1M1 Activity 1: Measurement1 Activity 1: Duration_a1M2 Activity 1: Rating_a1M2 Activity 1: Measurement2 Activity 2: Duration Activity 2: Rating _________ _______ ___________ _________________________ _______________________ ________________________ _________________________ _______________________ ________________________ ____________________ __________________ Person 1 Monday 01-Mar-2021 NaN NaN NaN 100 4 5000 45 6 Person 2 Monday 01-Mar-2021 NaN NaN NaN 85 4 5000 60 7 Person 3 Monday 01-Mar-2021 90 3 5000 90 3 5000 50 6 Person 4 Monday 01-Mar-2021 100 4 5000 105 3 5000 60 5 Person 5 Monday 01-Mar-2021 90 4 5000 110 4 5000 60 8 Person 6 Monday 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN Person 7 Monday 01-Mar-2021 NaN NaN NaN 100 5 5000 60 6 Person 8 Monday 01-Mar-2021 85 5 5000 95 7 5000 15 2 Person 9 Monday 01-Mar-2021 110 4 5000 110 5 5000 75 7 Person 10 Monday 01-Mar-2021 90 3 5000 110 4 5000 NaN NaN Person 11 Monday 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN Person 12 Monday 01-Mar-2021 NaN NaN NaN 105 3 5000 NaN NaN Person 13 Monday 01-Mar-2021 95 5 5000 NaN NaN NaN NaN NaN Person 14 Monday 01-Mar-2021 95 4 5000 80 7 4000 60 7 Person 1 Tuesday 02-Mar-2021 90 3 5000 95 4 5000 55 5 Person 2 Tuesday 02-Mar-2021 95 3 5000 95 5 5000 NaN NaN
  댓글 수: 1
ANDREW Feenan
ANDREW Feenan 2021년 8월 5일
Hi Cris,
Thank you for your help.
Andrew

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Konrad
Konrad 2021년 8월 5일
편집: Konrad 2021년 8월 6일
Hi Andrew,
I think unstack() is what you're looking for:
T=readtable('example_sheet.xlsx');
uT = unstack(T,{'Duration' 'Rating' 'Measurement1' 'Measurement2'},'Category');
Best, Konrad
  댓글 수: 5
Cris LaPierre
Cris LaPierre 2021년 8월 5일
I think it's the same approach, but you could just use indexing.
T=readtable('example_sheet.xlsx');
Warning: Column headers from the file were modified to make them valid MATLAB identifiers before creating variable names for the table. The original column headers are saved in the VariableDescriptions property.
Set 'VariableNamingRule' to 'preserve' to use the original column headers as table variable names.
newT = unstack(T,{'Duration' 'Rating' 'Measurement1' 'Measurement2'},'Category');
Warning: Table variable names that were not valid MATLAB identifiers have been modified. Since table variable names must be unique, any table variable names that happened to match the new identifiers also have been modified.
To use the original INDVAR values as table variable names, set 'VariableNamingRule' to 'preserve'.
newT = newT(:,[1:4 6 8 4 6 10 5 7]);
% name variables
newT.Properties.VariableNames(4:6) = "Activity 1: "+T.Properties.VariableNames(5:7);
newT.Properties.VariableNames(7:9) = "Activity 1b: "+T.Properties.VariableNames(5:7);
newT.Properties.VariableNames(10:11) = "Activity 2: "+T.Properties.VariableNames(5:6)
newT = 28×11 table
Name Day Date Activity 1: Duration Activity 1: Rating Activity 1: Measurement1 Activity 1b: Duration Activity 1b: Rating Activity 1b: Measurement1 Activity 2: Duration Activity 2: Rating _____________ ___________ ___________ ____________________ __________________ ________________________ _____________________ ___________________ _________________________ ____________________ __________________ {'Person 1' } {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN 45 6 {'Person 2' } {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN 60 7 {'Person 3' } {'Monday' } 01-Mar-2021 180 6 NaN 180 6 NaN 50 6 {'Person 4' } {'Monday' } 01-Mar-2021 205 7 NaN 205 7 NaN 60 5 {'Person 5' } {'Monday' } 01-Mar-2021 200 8 NaN 200 8 NaN 60 8 {'Person 6' } {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN {'Person 7' } {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN 60 6 {'Person 8' } {'Monday' } 01-Mar-2021 180 12 NaN 180 12 NaN 15 2 {'Person 9' } {'Monday' } 01-Mar-2021 220 9 NaN 220 9 NaN 75 7 {'Person 10'} {'Monday' } 01-Mar-2021 200 7 NaN 200 7 NaN NaN NaN {'Person 11'} {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN {'Person 12'} {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN {'Person 13'} {'Monday' } 01-Mar-2021 NaN NaN NaN NaN NaN NaN NaN NaN {'Person 14'} {'Monday' } 01-Mar-2021 175 11 NaN 175 11 NaN 60 7 {'Person 1' } {'Tuesday'} 02-Mar-2021 185 7 NaN 185 7 NaN 55 5 {'Person 2' } {'Tuesday'} 02-Mar-2021 190 8 NaN 190 8 NaN NaN NaN
Konrad
Konrad 2021년 8월 6일
Just to promote regular expressions (and because I love regexp, I admit):
The desired column names can be achieved in one line:
uT.Properties.VariableNames = regexprep(uT.Properties.VariableNames, '(^[^_]+)_([^_]+)$', '$2: $1')

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Data Import from MATLAB에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by