Table transform - rows to variable number of columns
조회 수: 2 (최근 30일)
이전 댓글 표시
Trying to transform a table that is one ID to variable number of rows, to one that is one ID to variable number columns:
Original Table: Y x 3 table
ID State Item
A1 MD Pencil
A1 NJ Pen
B2 MD Pen
...
Z26 MD Marker
Z26 NJ Crayon
…
Z26 DE Pencil
Desired Table: length(unique(ID) x (1+2*N) table
ID State1 Item1 State2 Item2 … StateN ItemN
A1 MD Pencil NJ Pen
B2 MD Pen
...
Z26 MD Marker NJ Crayon … DE Pencil
without a painful for loop... Thank you for suggestions!
댓글 수: 2
the cyclist
2021년 6월 15일
I don't know if there is a slick way to do this without loops, but I am really curious why you think this is a good idea. Your data are in what is known as the tidy format, and this is typically ideal for later analytics.
Do you mind explaining what your next step with the reformatted table is? Maybe there is a way to handle it with the data in their current layout.
답변 (1개)
Pramil
2024년 5월 24일
편집: Pramil
2024년 5월 24일
Hi Lauren,
To achieve the said transformation without using the “for” loop, you can do the following in MATLAB R2021b:
data = {'A1', 'MD', 'Pencil'; 'A1', 'NJ', 'Pen'; 'B2', 'MD', 'Pen'; 'Z26', 'MD', 'Marker'; 'Z26', 'NJ', 'Crayon'; 'Z26', 'DE', 'Pencil'};
originalTable = cell2table(data, 'VariableNames', {'ID', 'State', 'Item'});
- Let us assume the data is stored in “OriginalTable” variable. Count the occurrences of each “ID” using “groupcounts” method and find the count of “ID” having maximum occurrence using “max” function.
% Count the occurrences of each ID
idCounts = groupcounts(originalTable.ID);
% Find the maximum count to determine the number of columns needed
maxItems = max(idCounts);
The value “maxItems” dictates the number of “state” and “item” columns needed in the final table.
- Group the original table by “ID” and apply a custom function “groupFun” to each group. This function will organize the grouped data into a format that aligns with the desired table structure. Use “splitapply” for this step, passing “maxItems” as an additional argument to “groupFun”.
% Group by ID and apply the function
groupedData = splitapply(@(varargin) groupFun(varargin, maxItems), originalTable, findgroups(originalTable.ID));
- Define the custom function “groupFun”.
function out = groupFun(varargin)
maxLength = 2*varargin{2}+1;
currLength = 2*length(varargin{1}{1})+1;
% Preallocate cell array for the maximum number of items
out = cell(1, maxLength);
out(1) = varargin{1}{1}(1);
out(2:2:currLength) = varargin{1}{2}';
out(3:2:currLength) = varargin{1}{3}';
end
- After applying the custom function to each group, the “groupedData” needs to be structured into a table. Define the variable names for this table, considering the “ID” and the dynamic number of “State” and “Item” columns based on “maxItems”.
% Convert grouped data into a table
varNames = cell(1,2*maxItems+1);
varNames(1) = {'ID'};
varNames(2:2:2*maxItems+1) = cellstr(strcat('State', string(1:maxItems)));
varNames(3:2:2*maxItems+1) = cellstr(strcat('Item', string(1:maxItems)));
desiredTable = cell2table(groupedData, 'VariableNames', varNames)
You can check out the functions “groupcounts”, “findgroups” and “splitapply” through the following links to know more about their usage:
Hope it helps.
댓글 수: 0
참고 항목
카테고리
Help Center 및 File Exchange에서 Data Preprocessing에 대해 자세히 알아보기
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!