Table with a few identical column names

Question

tpolgar 2017년 5월 12일

1
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/339955-table-with-a-few-identical-column-names

댓글: tpolgar 2017년 5월 15일

Hello Matlab Community,

I got a not-so-regular data table (I converted it to .csv) containing identically named columns, and I have to calculate mean values both horizontally and vertically.

The schame is the following: The first cells of each row is a string, the others are doubles.

a b b b c c [...]
x  
x  
y  
z  
z  
z  
[...]

The output table should be:

a b c [...]  
x   
y  
z  
[...]

Where the values are the mean values, like the x row is the means of

-all the x rows (like the 'grup by' command in sql) and

-all the b, c, ... values

I know that in standard database management I would have to re-structure and normalize the table, but is there any solution in Matlab to solve this?

All I could try is to group the rows:

function [ C ] = CsvAvg( in_csv )
T = readtable(in_csv);
C = table;
C(1,1)=T(1,1);
for i=2:height(T)-1 
           if strcmp(strjoin(table2cell(T(i,1))),strjoin(table2cell(T(i-1,1))))==0
            C(i,1)=T(i,1);
            for j=2:width(T)
               % 
            end
           else C(i,1)=cell2table(cellstr(' '));
           end
end
end

I noticed that Matlab re-name the identical column names (which is correct in most cases).

Thank you in advance and Im sorry for my lack of English language skills. Im a beginner in Matlab programming too.

댓글 수: 2
없음 표시없음 숨기기

Andrei Bobrov 2017년 5월 12일

Please attach your csv-file (or small example of your csv-file).

tpolgar 2017년 5월 12일

test.csv

The "table" is huge, i cut a small portion of it. I had to change the row and column names, sorry!

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Guillaume 2017년 5월 12일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/339955-table-with-a-few-identical-column-names#answer_266741

편집: Guillaume 2017년 5월 12일

MATLAB Online에서 열기

Grouping the rows is trivially achieved with varfun in just one line:

newT = varfun(@mean, T, 'GroupingVariables', 1)

The tricky bit is indeed the grouping of the columns as it's not something that's really supported by tables. Probably the easiest way is to use a custom function with rowfun. This can do the row grouping at the same time. Something like:

function varargout = groupingfunction(columngroups, columns)
  %this function to be used with rowfun with the option 'SeparateInputs', false
  %columngroup: row vector of integers from 1:n indicating how to group the column together (all 1 columns together, all 2 columns together, etc.)
  %columns contains the content of the columns of the table
  columngroups = repmat(columngroups, size(columns, 1), 1);  %replicate grouping for each row
  varargout = num2cell(accumarray(columngroups(:), columns(:), [], @mean));
end

You can then use that with rowfun:

columgroups = [1 1 1 2 2 3 3 3 4 4 4 4 5 6 6 6];  %generate that however you want. Indicates how to group the columns together, ignoring the first column
columnames = {'a', 'b', 'c', 'd', 'e', 'f'};  %as many as there are unique values in columngroups
newT = rowfun(@(cols) groupingfunction(columngroups, cols), T, 'GroupingVariables', 1,'SeparateInputs', false, 'NumOutputs', numel(unique(columngroups)), 'OutputVariableNames', columnnames);

A possible way of generating columngroups and columnnames, assuming they're all named prefix_number

prefixes = regexp(T.Properties.VariableNames(2:end), '.*?(?=(_\d+)?$)', 'match', 'once');
[columnnames, ~, columngroups] = unique(prefixes);

edit, now that you've posted some demo data: The above works without issue on your demo data, as long as you transpose the columngroups vector returned by unique:

T = readtable('test.csv');
prefixes = regexp(T.Properties.VariableNames(2:end), '.*?(?=(_\d+)?$)', 'match', 'once');
[columnnames, ~, columngroups] = unique(prefixes);
newT = rowfun(@(cols) groupingfunction(columngroups.', cols), T, 'GroupingVariables', 1,'SeparateInputs', false, 'NumOutputs', numel(unique(columngroups)), 'OutputVariableNames', columnnames);

댓글 수: 2
없음 표시없음 숨기기

Peter Perkins 2017년 5월 12일

Nicely done.

tpolgar 2017년 5월 15일

Awesome, thank you very much!

댓글을 달려면 로그인하십시오.

Table with a few identical column names

댓글 수: 2
없음 표시없음 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

Table with a few identical column names

댓글 수: 2 없음 표시없음 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (0개)

참고 항목

카테고리

태그

Community Treasure Hunt

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 2
없음 표시없음 숨기기