Consecutive count of values based on multiple conditions

조회 수: 4 (최근 30일)
mtango
mtango 2020년 11월 19일
답변: Setsuna Yuuki. 2020년 11월 19일
I have a very big table consisting of approximately 3 million rows. The table consists of four columns [ID period amount pbehind], see the following screenshot including the output column I want.
For every ID, I want to count the consecutive amount of times of behind payment (pbehind). But there are a few conditions to this:
  1. when pbehind <= 0.5, then output = 0
  2. when pbehind > 1.5, but value in previous period is 0, then output = 0;
  3. when pbehind > 0.5, but amount in next period (which is last period) is zero, then output=0.
There are a couple of things to note:
  1. period does not have to start at 1 and increase by 1 (see ID = 2). However it is ordered by ID and period using sortrows(table,[1,2]).
  2. An ID can make multiple payments for a certain period, however pbehind will always be equal to for that period (see ID=3).
I know for the first condition that I can just do the following:
output = pbehind;
output(output<=0.5)=0;
But I cannot figure out how to efficiently implement the other conditions. Does anyone know how to do this?

답변 (2개)

Peter Perkins
Peter Perkins 2020년 11월 19일
Put your data in a table, and use rowfun with ID as the grouping variable. Write a function that does all the logic you need to enforce. Your function will be applied to one group of rows at a time, and should return a column vector of output values. Here's a simple example:
>> t = table([1;1;1;2;2],rand(5,1),rand(5,1),'VariableNames',{'ID' 'X' 'Y'})
t =
5×3 table
ID X Y
__ ________ ________
1 0.82202 0.26854
1 0.041591 0.63908
1 0.91635 0.031734
2 0.17678 0.33395
2 0.92236 0.8908
>> t2 = rowfun(@(x,y) x - mean(y),t,'GroupingVariable','ID')
t2 =
5×3 table
ID GroupCount Var3
__ __________ ________
1 3 0.50891
1 3 -0.27152
1 3 0.60323
2 2 -0.4356
2 2 0.30998
>> t.Output = t2.Var3
t =
5×4 table
ID X Y Output
__ ________ ________ ________
1 0.82202 0.26854 0.50891
1 0.041591 0.63908 -0.27152
1 0.91635 0.031734 0.60323
2 0.17678 0.33395 -0.4356
2 0.92236 0.8908 0.30998
Obviously you will need to write your own function.
  댓글 수: 1
mtango
mtango 2020년 11월 19일
The problem is that I cannot figure out how to create code to count the consecutive amount of times of behind payment (pbehind) for each ID...

댓글을 달려면 로그인하십시오.


Setsuna Yuuki.
Setsuna Yuuki. 2020년 11월 19일
you should try with a series of if..else. For example:
for i = length(output)
if(pbehind(i) > 1.5 && pbehind(i-1) == 0)
output(i) =0;
elseif(condition)
...
end

카테고리

Help CenterFile Exchange에서 Tables에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by