Find duplicate entries and sum up their associated values then put everything back in a table

조회 수: 19 (최근 30일)
Hello everyone,
I have an excel sheet with two columns. The first column is a list of names, and the second column is the score of each person. Some names have multiple score and I want to find duplicate entries and sum up their associated values then put everything back in a table. Basically clean up the duplicate but want the sum of the scores. The excel file is attached as well
I would really appreciate if you have a script or a function can achieve the results.
Here where I’m at
Data = readtable("Data.xlsx")
unique(Data.Names)
[C,ia,idx] = unique(Data(:,1),'stable');
val = accumarray(idx,Data(:,2),[],@sum);
mat = [C val]

채택된 답변

dpb
dpb 2021년 7월 8일
tData = readtable("Data.xlsx");
tData.Properties.VariableNames={'Name','Score'}; % set a known set of variable names
tSum=rowfun(@sum,tData,'InputVariables','Score','GrouptingVariables','Name');
writetable(tSum,'PickAnOutputFileName.xls')
  댓글 수: 2
LeoAiE
LeoAiE 2021년 7월 8일
@dpb Thank you for your help but it didn't run corretly for some reason it giving the follwoing message
LeoAiE
LeoAiE 2021년 7월 8일
@dpb Never mind it was grouping Variables misspelled. Thank you for your help I really appreciate it

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Cris LaPierre
Cris LaPierre 2021년 7월 8일
It gets a bit trickly since you don't want to change the order of the names, but I think you could do this with groupsummary.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
Data.Order = (1:height(Data))'
Data = 50×3 table
Names Scores Order _____________________ ______ _____ {'Daniel Nelson' } 78 1 {'Henry Cunningham' } 28 2 {'Adison Dixon' } 41 3 {'Haris Carroll' } 15 4 {'Gianna Jones' } 3 5 {'Ted Alexander' } 64 6 {'Connie Hill' } 43 7 {'Abigail Clark' } 7 8 {'Nicole Brooks' } 2 9 {'Heather Tucker' } 67 10 {'Alen Crawford' } 97 11 {'Dexter Carroll' } 32 12 {'Elian Thompson' } 9 13 {'Nicholas Campbell'} 35 14 {'Alina Davis' } 72 15 {'Adelaide Gibson' } 57 16
% Use groupsummary to find sum and min of scores and order for each unique name
sumTbl = groupsummary(Data,"Names",["sum","min"],["Scores" "Order"])
sumTbl = 18×6 table
Names GroupCount sum_Scores min_Scores sum_Order min_Order _____________________ __________ __________ __________ _________ _________ {'Abigail Clark' } 3 135 7 61 8 {'Adelaide Gibson' } 3 160 45 103 16 {'Adison Dixon' } 4 242 41 107 3 {'Alen Crawford' } 2 101 4 43 11 {'Alina Davis' } 2 76 4 51 15 {'Connie Hill' } 4 81 4 107 7 {'Daniel Nelson' } 2 103 25 44 1 {'Dexter Carroll' } 2 89 32 45 12 {'Elian Thompson' } 2 95 9 47 13 {'Elian Wells' } 2 130 45 55 17 {'Gianna Jones' } 4 122 3 115 5 {'Haris Carroll' } 4 181 15 111 4 {'Heather Tucker' } 3 112 13 67 10 {'Henry Cunningham' } 2 128 28 46 2 {'Leonardo Elliott' } 2 46 9 57 18 {'Nicholas Campbell'} 2 74 35 49 14
% sort the results, which are alphabetical, so they are back in the original order
sumTbl = sortrows(sumTbl,"min_Order");
% create a new table with just Names and total scores
finalData = sumTbl(:,["Names","sum_Scores"]);
% rename 'sum_Scores' back to 'Scores'
finalData.Properties.VariableNames(2) = "Scores"
finalData = 18×2 table
Names Scores _____________________ ______ {'Daniel Nelson' } 103 {'Henry Cunningham' } 128 {'Adison Dixon' } 242 {'Haris Carroll' } 181 {'Gianna Jones' } 122 {'Ted Alexander' } 190 {'Connie Hill' } 81 {'Abigail Clark' } 135 {'Nicole Brooks' } 105 {'Heather Tucker' } 112 {'Alen Crawford' } 101 {'Dexter Carroll' } 89 {'Elian Thompson' } 95 {'Nicholas Campbell'} 74 {'Alina Davis' } 76 {'Adelaide Gibson' } 160
  댓글 수: 3
dpb
dpb 2021년 7월 8일
I didn't think of groupsummary, Chris. Good thinking. Was keeping the order an input requirement? I missed it if it was. I've done same as you before several times; have one humongous spreadsheet that has to get updated piecemeal -- it has a (hidden) column that contains =ROW() precisely for the purpose.
Cris LaPierre
Cris LaPierre 2021년 7월 8일
I had interpreted it as having to be in the same order, but on a re-read, that is not stated. If not, it gets much simpler.
% Load the data and add a column for keeping track of the original order
Data = readtable("AladdinData.xlsx");
% Use groupsummary to find sum scores for each unique name
finalData = groupsummary(Data,"Names","sum")
This keeps the GroupCount information, but that's not bad info to have either.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Spreadsheets에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by