Duplicate x,y in table; get min/max of variables to new table

조회 수: 1 (최근 30일)
newbie9
newbie9 2019년 11월 2일
댓글: newbie9 2019년 11월 2일
I have a large dataset, small subset attached. There are duplicate x,y values. I need a cleaned table with "var1_max, var1_min, var2_max, var2_min" etc. for every duplicate "x,y"
I'm just not sure where to start setting up the problem. Thanks for any pointers to get going. Using Matlab R2018a, no fancy add-ons/packages
file_in = 'example.csv';
data_in = readtable(file_in);
%%table_out.Properties.VariableNames = {'id', 'x', 'y', 'var1_max', 'var1_min', ...
%% 'var2_max', 'var2_min', 'var3_max', 'var3_min', 'var4_max', 'var4_min', ...
%% 'var5_max', 'var5_min', 'var6_max', 'var6_min', 'var7_max', 'var7_min'};
  댓글 수: 3
Cris LaPierre
Cris LaPierre 2019년 11월 2일
Not sure I understand completely. I have the same questions a dpb. It looks like you've highlighted groups, but rows 12-14 only have duplicate x values, not x and y.
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
Rows 6-10 do have values there. What should the min/max be?
And finally, 12-14 have same x but different y. How should this be handled?
newbie9
newbie9 2019년 11월 2일
편집: newbie9 2019년 11월 2일
Please disregard highlight on rows 12-14, it is in error.
I think I should add "var1_mid" ... "var7_mid" to handle cases where x,y are unique. Example in row 1:
  • x = 31.74053 | y = 27.18408 | var1_max = NaN | var1_min = NaN | var1_mid = 90 | var2_max = NaN | var2_min = NaN | var2_mid = 54 | ...etc.
Answers to other questions below:
For the min/max values you want, do you want the overall min/max or just the min/max within the group (those that have the same x,y values)?
  • just the min/max within the group (those that have the same x,y values)
For example, rows 2 and 3 above do not have values for var1-3. What values should they have?
  • NaN
Rows 6-10 do have values there. What should the min/max be?
  • x = 31.79403 | y = 27.26837 | var1_max = 79 | var1_min = 68 | var1_mid = NaN | var2_max = 251 | var2_min = 193 | var2_mid = NaN | ...etc.

댓글을 달려면 로그인하십시오.

채택된 답변

Cris LaPierre
Cris LaPierre 2019년 11월 2일
One potential solution that is simple to implement is to use groupsummary. You can have your data grouped by x and y, and have it return the within-group min, mean, and max for table variables you specify.
However, you prescribe wanting NaN for mid-value if there are min/max values. This won't do that .It will provide values for all 3 statistics. Also note that group summary will only return one row for each group. You can use the join function if you want to merge the two together.
newData = groupsummary(data_in,{'x','y'},{'min','mean','max'},["var1","var2","var3","var4","var5","var6","var7"])
  댓글 수: 1
newbie9
newbie9 2019년 11월 2일
This is perfect, thank you. The groupsummary returns a GroupCount, so I can just use that such that if > 1 then mean = NaN

댓글을 달려면 로그인하십시오.

추가 답변 (0개)

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

태그

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by