using acummarray to average several columns at a time?

Question

Osnofa 2018년 12월 29일

0
링크

이 질문에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/437624-using-acummarray-to-average-several-columns-at-a-time

댓글: Razvan Carbunescu 2019년 1월 4일

채택된 답변: Razvan Carbunescu

sampledata.xls

MATLAB Online에서 열기

Hello

I have a data array (mat) with the following dimensions: 149016x93

The columns are

2001 | 1 | 1 | 0 | random numbers ...

... | ... | ... | ... | random numbers ...

2017 | 12 | 31 | 23 | random numbers ...

The data is random and it is what I want to average.

I found this example (MathWorks example) and it is fine, however I've been strugling in how to run it over column 5 to 93...

[ah,~,ch] = unique(mat(:,2:4),'rows');
hraverage = [ah,accumarray(ch,mat(:,5),[],@nanmean)];

My problem is that I'm not being able to have as an output the 8784x93 array, only an 8784* x 4, I've tried loops but i'm missing something that I am not aware of...

*The dataset has several years of data. I want the hour average for each each day of the year. So it's 366 days * 24hours = 8784

for the sake of example, please feel free to consider a smaller array.

thank you for the attention! will keep digging on this...

sample data in attachment. randomly generated:

4 first collumns are: year, month, day, hour, and columns 5 to 7 are data columns.

the final result should be a 8784x7 file.

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

Osnofa 2018년 12월 29일

편집: Osnofa 2018년 12월 29일

great question. missed that explanation in the opening post.

The dataset has several years of data. I want the hour average for each day of the year. So it's 366 days * 24hours = 8784

that is why I use:

[ah,~,ch] = unique(mat(:,2:4),'rows');

column 2 is month,3 is days and 4 is hours.

dpb 2018년 12월 29일

Convert to timetable and use retime and/or findgroups/splitapply pair

댓글을 달려면 로그인하십시오.

이 질문에 답변하려면 로그인하십시오.

Answer 1

Razvan Carbunescu 2019년 1월 4일

4
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/437624-using-acummarray-to-average-several-columns-at-a-time#answer_355012

MATLAB Online에서 열기

There's a simpler way of doing this with groupsummary

I imported the sampledata and made the table with 7 columns: Year,Month,Day,Hour,VarName5,VarName6,VarName7

Then used the following commands to take advantage of binning in groupsummary and of being able to include empty groups:

sampledata.Time = datetime(sampledata.Year,sampledata.Month,sampledata.Day)
result = groupsummary(sampledata,{'Time','Hour'},{'dayofyear','none'},'mean',{'VarName5','VarName6','VarName7'},'IncludeEmptyGroups',1)

댓글 수: 2
없음 표시없음 숨기기

Image Analyst 2019년 1월 4일

I didn't know about this function. So is this kind of like grpstats() but it's in base MATLAB so you don't need the Stats toolbox, and it has additional computations?

Razvan Carbunescu 2019년 1월 4일

It's a fairly new function from R2018a in base MATLAB. Yes it is very similar to grpstats but should be nicer to use for tables.

The extra options it has relate to binning, missing data and empty groups.

댓글을 달려면 로그인하십시오.

Answer 2

dpb 2018년 12월 30일

2
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/437624-using-acummarray-to-average-several-columns-at-a-time#answer_354443

MATLAB Online에서 열기

t=readtable('sampledata.xls','ReadVariableNames',0);                                % read file to table
t.Properties.VariableNames={'Year','Month','Day','Hour','D1','D2','D3'};            % convenient names
tt=table2timetable(t(:,5:end),'RowTimes',datetime(t.Year,t.Month,t.Day,t.Hour,0,0));  % to timetable
mnDaily=retime(tt,'daily','mean');                                                  % averages by day

See what we gots...

>> mnDaily(1:4,:)
ans =
  4×3 timetable
            Time              D1        D2        D3  
    ____________________    ______    ______    ______
    01-Jan-2001 00:00:00       701     173.5     639.5
    02-Jan-2001 00:00:00       223       614       484
    03-Jan-2001 00:00:00    642.33       196    598.33
    04-Jan-2001 00:00:00       318    243.25     534.5
>> 

For real case with a very large number of variables, rather than naming them all sequentially, I'd sugget reading the spreasheet data as an array and build the table from it instead...

data=xlsread('sampledata.xls');
t=table(datetime(data(:,1),data(:,2),data(:,3),data(:,4),0,0),data(:,5:end));
t.Properties.VariableNames={'Date','Data'};
tt=table2timetable(t);
mnDaily=retime(tt,'daily','mean');

and will have same result excepting the means will be an array instead of individual variables.

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

dpb 2019년 1월 2일

편집: dpb 2019년 1월 3일

MATLAB Online에서 열기

Ah...you beat me to it, Peter! It came to me while doing the other mind-numbing data cleanup task that had taken break from and just came back to comment on the "why"...

I didn't double-check for absolute certain of whether was one or none but there were a lot of NaN elements in the sample data set and I suspect there was at least one subset that turned out empty altho the symptom also fits the one-row scenario.

Hmm....that's an interesting result...

mean([])

returns NaN, not []. What's the logic in that?

Answers Self... :)

Makes the PP example work in returning vector result...

Sean de Wolski 2019년 1월 3일

A little bit of shameless blog promotion for that exact use case:

https://blogs.mathworks.com/loren/2018/12/05/debugging-grouped-operations/

댓글을 달려면 로그인하십시오.

Answer 3

Image Analyst 2018년 12월 29일

1
링크

이 답변에 대한 바로 가기 링크

https://kr.mathworks.com/matlabcentral/answers/437624-using-acummarray-to-average-several-columns-at-a-time#answer_354358

Why not simply use grpstats() if you have the Statistics and Machine Learning Toolbox?

Attach your data if you need help.

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

dpb 2018년 12월 30일

As noted above, convert the date column data to datetimes and put into a timetable and then use findgroups/splitapply pair or retime

This sort of thing is precisely what they're for...

Osnofa 2018년 12월 30일

will take a look into it, didn't notice yesterday. thanks for the reminder.

댓글을 달려면 로그인하십시오.

using acummarray to average several columns at a time?

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 2
없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

using acummarray to average several columns at a time?

댓글 수: 3 이전 댓글 1개 표시이전 댓글 1개 숨기기

채택된 답변

댓글 수: 2 없음 표시없음 숨기기

추가 답변 (2개)

댓글 수: 5 이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 7 이전 댓글 5개 표시이전 댓글 5개 숨기기

참고 항목

카테고리

태그

제품

Community Treasure Hunt

댓글 수: 3
이전 댓글 1개 표시이전 댓글 1개 숨기기

댓글 수: 2
없음 표시없음 숨기기

댓글 수: 5
이전 댓글 3개 표시이전 댓글 3개 숨기기

댓글 수: 7
이전 댓글 5개 표시이전 댓글 5개 숨기기