multiple condition sumif without loop

조회 수: 18 (최근 30일)
ahmed920
ahmed920 2022년 5월 26일
답변: Seth Furman 2022년 6월 2일
I have the following table and want to sum over unique dates and Accounts to get the daily changes per account
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Right Now I can do it using the following for loop to compute the result but My actual table has thousands of records and the calculation takes forever so have been trying to get it to work using indexes but had no luck so far.
UniqueDebit = unique(X.AccountDebited);
UniqueCredit = unique(X.AccountCredited);
UniqueConc = unique([UniqueDebit;UniqueCredit]);
UniqueDates = unique(X.Date);
for i = 1:length(UniqueDates)
for j = 1:height(UniqueConc)
RowsCredited = find(X.Date == UniqueDates(i) & X.AccountCredited == UniqueConc(j));
RowsDebited = find(X.Date == UniqueDates(i) & X.AccountDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited));
DailyDebited = sum(X.AmountDebited(RowsDebited));
DailyChanges = DailyCredited - DailyDebited;
XMatrix(i,j) = DailyChanges;
end
end
XTable=array2table(XMatrix);
XTable.Properties.VariableNames = UniqueConc;
XTable.Properties.RowNames = string(UniqueDates);

채택된 답변

Jan
Jan 2022년 5월 26일
편집: Jan 2022년 5월 26일
Start with avoiding repeated work:
XMatrix = zeros(length(UniqueDates), height(UniqueConc)); % Pre-allocate!!!
xCredit = X.AccountCredited; % Abbreviation
xDebit = X.AccountDebited;
for i = 1:length(UniqueDates)
tmp = (X.Date == UniqueDates(i)); % Move out of the inner loop
for j = 1:height(UniqueConc)
RowsCredited = (tmp & xCredited == UniqueConc(j));
RowsDebited = (tmp & xDebited == UniqueConc(j));
DailyCredited = sum(X.AmountCredited(RowsCredited)); % [EDITED]
DailyDebited = sum(X.AmountDebited(RowsDebited)); % [EDITED]
XMatrix(i,j) = DailyCredited - DailyDebited;
end
end
Avoid the find(), because logical indexing is faster.
Is XMatrix preallocated?
  댓글 수: 3
Jan
Jan 2022년 5월 26일
편집: Jan 2022년 5월 26일
It is hard to optimize code without having realistic input data. I cannot estimate, if an accumarray approach is faster here. Are you sure, that the loop is the bottleneck?
Another idea is to use ismember once, such that Matlab can search in numerical indices instead of string arrays:
xAccDebit = X.AccountDebited;
xAccCredit = X.AccountCredited;
xAmDebit = X.AmountDebited;
xAmCredit = X.AmountCredited;
uDebit = unique(xAccDebit);
uCredit = unique(xAccCredit);
uConc = unique([uDebit; uCredit]);
[uDates, ~, uDateInd] = unique(X.Date);
[~, creditInd] = ismember(xAccCredit, uConc);
[~, debitInd] = ismember(xAccDebit, uConc);
X = zeros(numel(uDates), numel(uConc)); % Pre-allocate
for i = 1:numel(uDates)
tmp = (uDateInd == i); % same as: (X.Date == uDates(i));
mCreditInd = creditInd .* tmp; % mask FALSE in tmp as 0 in vector
mDebitInd = debitInd .* tmp; %
for j = 1:numel(uConc)
RowsCredited = (mCreditInd == j);
RowsDebited = (mDebitInd == j);
DailyCredited = sum(xAmCredit(RowsCredited));
DailyDebited = sum(xAmDebit(RowsDebited));
X(i,j) = DailyCredited - DailyDebited;
end
end
Is numel(uDates) much larger than numel(uConc)? Maybe swapping the loops is faster.
I've edited the code in my first answer: I've abbreviated "X.AccountDebited" and "X.AmountDebited" both by "xDebited".
Are you working with less than 65536 elements? Then it might save some time, if you convert mCreditInd and mDebitInd to UINT16 and the loop e.g. to:
for j = uint16(1):uint16(numel(uConc))
ahmed920
ahmed920 2022년 5월 27일
편집: ahmed920 2022년 5월 27일
yeah number uDates is 4 times larger than numel uConc. for now yeah less than 65536 elements. Thanks this works it further reduced my time by a factor of 4.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Seth Furman
Seth Furman 2022년 6월 2일
It's worth noting that a lot of these computations can be expressed more succinctly using groupsummary, outerjoin, and fillmissing.
TransactionID = {'102001';'102002';'102003';'102004';'102005';'102006';'102007';'102008';'102009';'102010'};
Date = datetime({'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-06';'2021-02-09';'2021-02-09';'2021-02-09';'2021-02-17'},"InputFormat","uuuu-MM-dd");
AmountDebited = [3000; 0; 0; 3000; 4443; 2.02877864; 430; 50; 200; 0.297598984636403];
AccountDebited = ["TD-CAD";"LC-CAD";"LC-CAD";"SI-CAD";"SP-CAD";"LC-USD";"SP-CAD";"SP-CAD";"SP-CAD";"LC-USD"];
AmountCredited = [3000; 0; 0; 3000; 2.03048665; 4450; 0.18713085; 0.02177471; 0.0869854; 0.295498984636403];
AccountCredited = ["SP-CAD"; "MM-LPT"; "KN-DIR"; "SP-CAD"; "LC-USD"; "SI-CAD"; "LC-USD"; "LC-USD"; "LC-USD"; "MM-USD"];
X = table(TransactionID, Date, AmountDebited, AccountDebited, AmountCredited, AccountCredited);
Find total amounts credited/debited for each date-account pair
credited = groupsummary(X,["Date","AccountCredited"],"sum",["AmountCredited"])
credited = 7×4 table
Date AccountCredited GroupCount sum_AmountCredited ___________ _______________ __________ __________________ 06-Feb-2021 "KN-DIR" 1 0 06-Feb-2021 "LC-USD" 1 2.0305 06-Feb-2021 "MM-LPT" 1 0 06-Feb-2021 "SI-CAD" 1 4450 06-Feb-2021 "SP-CAD" 2 6000 09-Feb-2021 "LC-USD" 3 0.29589 17-Feb-2021 "MM-USD" 1 0.2955
debited = groupsummary(X,["Date","AccountDebited"],"sum",["AmountDebited"])
debited = 7×4 table
Date AccountDebited GroupCount sum_AmountDebited ___________ ______________ __________ _________________ 06-Feb-2021 "LC-CAD" 2 0 06-Feb-2021 "LC-USD" 1 2.0288 06-Feb-2021 "SI-CAD" 1 3000 06-Feb-2021 "SP-CAD" 1 4443 06-Feb-2021 "TD-CAD" 1 3000 09-Feb-2021 "SP-CAD" 3 680 17-Feb-2021 "LC-USD" 1 0.2976
Match total amounts credited/debited by date-account pair
credited.GroupCount = [];
debited.GroupCount = [];
sums = outerjoin(credited,debited,LeftKeys=["Date","AccountCredited"],RightKeys=["Date","AccountDebited"],MergeKeys=true)
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 NaN 06-Feb-2021 "LC-CAD" NaN 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 NaN 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" NaN 3000 09-Feb-2021 "LC-USD" 0.29589 NaN 09-Feb-2021 "SP-CAD" NaN 680 17-Feb-2021 "LC-USD" NaN 0.2976 17-Feb-2021 "MM-USD" 0.2955 NaN
Fill missing data
sums = fillmissing(sums,"constant",0,DataVariables=[3 4])
sums = 11×4 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited ___________ ______________________________ __________________ _________________ 06-Feb-2021 "KN-DIR" 0 0 06-Feb-2021 "LC-CAD" 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 06-Feb-2021 "MM-LPT" 0 0 06-Feb-2021 "SI-CAD" 4450 3000 06-Feb-2021 "SP-CAD" 6000 4443 06-Feb-2021 "TD-CAD" 0 3000 09-Feb-2021 "LC-USD" 0.29589 0 09-Feb-2021 "SP-CAD" 0 680 17-Feb-2021 "LC-USD" 0 0.2976 17-Feb-2021 "MM-USD" 0.2955 0
Compute net change by date-account pair
sums.NetChange = sums.sum_AmountCredited - sums.sum_AmountDebited
sums = 11×5 table
Date AccountCredited_AccountDebited sum_AmountCredited sum_AmountDebited NetChange ___________ ______________________________ __________________ _________________ _________ 06-Feb-2021 "KN-DIR" 0 0 0 06-Feb-2021 "LC-CAD" 0 0 0 06-Feb-2021 "LC-USD" 2.0305 2.0288 0.001708 06-Feb-2021 "MM-LPT" 0 0 0 06-Feb-2021 "SI-CAD" 4450 3000 1450 06-Feb-2021 "SP-CAD" 6000 4443 1557 06-Feb-2021 "TD-CAD" 0 3000 -3000 09-Feb-2021 "LC-USD" 0.29589 0 0.29589 09-Feb-2021 "SP-CAD" 0 680 -680 17-Feb-2021 "LC-USD" 0 0.2976 -0.2976 17-Feb-2021 "MM-USD" 0.2955 0 0.2955

카테고리

Help CenterFile Exchange에서 Data Type Identification에 대해 자세히 알아보기

제품


릴리스

R2022a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by