Conditional average (need help with speed)
이 질문을 팔로우합니다.
- 팔로우하는 게시물 피드에서 업데이트를 확인할 수 있습니다.
- 정보 수신 기본 설정에 따라 이메일을 받을 수 있습니다.
오류 발생
페이지가 변경되었기 때문에 동작을 완료할 수 없습니다. 업데이트된 상태를 보려면 페이지를 다시 불러오십시오.
이전 댓글 표시
0 개 추천
I have a table that looks like this:
country_id year M T average_T
1 2000 10 76 NaN
1 2001 5 39 Mean of 76 and 62
1 2002 NaN 37 Mean of 39 =39
1 2003 15 5 NaN
1 2004 10 28 Mean of 5 and 2
1 2005 10 8 Mean of 8=8
2 1999 15 1 NaN
2 2000 10 62 Mean of 1=1
2 2001 20 32 Mean of 76 and 62
2 2002 10 72 Mean of 32=32
2 2003 15 2 Mean of 5 and 2
I want to calculate the column average_T which is last year's average of the T values for the cases that have the same year and M value. (First entry for each id is NaN because we don't know past year's T for those entries)
I have written a code that can do this but it is impossible to run with my big data set:
mytable.average_T=NaN(N,1);
for k=2:N
if mytable{k,'country_id'} == mytable{k-1,'country_id'}
mytable.average_T(k,1)= mean(T(mytable.M==mytable.M(k-1)& ...
mytable.year==mytable.year(k-1)), 'omitNaN');
end
end
채택된 답변
Grouping variables and rowfun to the rescue...
tMeans=rowfun(@(x),mean(x,'omitnan'),mytable,'InputVariables','T','GroupingVariables',{'year','M'});
댓글 수: 11
Mia Dier
2021년 1월 17일
Thank you for your reply! The problem that I'm having with rowfun is that it sorts the result according to the grouping variables (year and M) but I want to get results ordered according to country_id and year. It is not possible to re-order the variables after getting the results with rowfun as I lose the country_id variable in the results.
Thank you!
tMeans=rowfun(@(x),mean(x,'omitnan'),mytable,'InputVariables','T','GroupingVariables',{'country_id','year','M'});
then.
Altho I now notice you had separated by country in the code snippet, I had just read the text of the question that doesn't say by country...
Hi again!
This again unfortunately doesn't solve the problem. I want the grouping variables to be year and M but I want the outcome to be ordered by country id first and then by year which is the original ordering of my dataset. I hope it is more clear now.
Thank you!
dpb
2021년 1월 17일
There is no such thing as a country order any longer once you've averaged over all country id, too.
Mia Dier
2021년 1월 17일
I know that we average over the country id too but I need to know which average_T belonged to which country_id. Please check the example table in my question. It should be easier to understand from the table.
Thank you again!
It makes no sense, no. You either compute average over each country ID as a group as well or you don't group countries -- if you keep the country id then that is the ID of the group; if you don't use countries as a grouping variable then there is no way to associate any given order of the contributing elements that made up that average to the average itself; that is gone.
As noted in the other Q? of the same subject, you could keep a set of which countries were include in the averaging, but that's all that is, there's no order to associate with the mean.
Or in a similar vein as in the other Q? comment you could assign an auxiliary variable that is the row in the table that is passed through the function and kept with the group that would identify the members of the group but again while that could be sorted, other than it is the identification of who is in the group, there's no meaning in the order in the computed mean.
BTW, this last id would just be the grouping index you could get from findgroups; it may be that the information contained from it is what you're actually looking for here, but the request as couched just doesn't make sense.
As for the previous year thing, you can simply associate the computed average of the year with the previous year after the fact or create another year variable that is the actual year+1 to use as the grouping variable instead.
Mia Dier
2021년 1월 17일
I'm not claiming that there is an order to associate with the mean. I'm just trying to sort the variables as they were in the first place. And keeping the 'original' order makes perfect sense if I have many other variables in that particular order and if I use them all together in a model later on.
Anyways, I found the solution to my problem my merging the rowfun results with the original data set.
Thank you.
dpb
2021년 1월 17일
"...merging the rowfun results with the original data set."
Well, yes, that's in line with what I was saying -- the result of rowfun with a merging/grouping operation are a totally separate output in number of elements/height of the output table.
If you replicate those back into the original set, then you can associate the mean with from whence it came, agreed; that's the same set of data as findgroups will tell you -- which group each element/row belongs to, and is in the original order.
tMeans=rowfun(@(x),mean(x,'omitnan'),mytable, ...
'InputVariables','T', ...
'GroupingVariables',{'country_id',year','M'}, ...
'OutputVariableNames',{'GroupMean'} );
mytable.GroupMeans=tMeans.GroupMean(findgroups(mytable.country_id,mytable.year,mytable.M));
should just populate the new column without any need to merge anything.
Now that we've defined the actual requirement, it's easier to solve the problem... :)
Mia Dier
2021년 1월 17일
Amazing thank you! :)
dpb
2021년 1월 17일
NB: You could do the same thing with findgroups and splitapply without building the output table from rowfun, too.
추가 답변 (0개)
카테고리
도움말 센터 및 File Exchange에서 Tables에 대해 자세히 알아보기
참고 항목
웹사이트 선택
번역된 콘텐츠를 보고 지역별 이벤트와 혜택을 살펴보려면 웹사이트를 선택하십시오. 현재 계신 지역에 따라 다음 웹사이트를 권장합니다:
또한 다음 목록에서 웹사이트를 선택하실 수도 있습니다.
사이트 성능 최적화 방법
최고의 사이트 성능을 위해 중국 사이트(중국어 또는 영어)를 선택하십시오. 현재 계신 지역에서는 다른 국가의 MathWorks 사이트 방문이 최적화되지 않았습니다.
미주
- América Latina (Español)
- Canada (English)
- United States (English)
유럽
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
