How to average over different length vectors without excessive for loops?

조회 수: 135 (최근 30일)
Hi there,
My problem involves running lots of different stochastic simulations (imagine some sort of Brownian motion) and then averaging over all of these different histories to compute quantites such as mean, variance etc
At the moment for each run I do have an output vector that that e.g. could be
X1 = [0 1 4 6 8]
where each new entry in the vector represents the position of a particle after a standard time increment. Here we have 5 elements of the vector so there have been 4 time increments. Although in practice these would be much longer. The problem is that each run ends when a certain condition is met (say X = 8) and this generically happens after differnt times. This means the next run might be something like
X2 = [0 4 8]
Which is only 3 elements long and thus only 2 time increments. I have done this for R number of runs. If each Xi vector had the same length I know I could simply collect them in one object X like so:
X = [X1; X2; ... XR]
and then compute the mean using the mean function in the appropriate direction. However unfortunately this wouldn't work in this case as the vectors are of different lengths.
For example if all I had was X1 and X2 I want some process that would calculate the mean at each timestep like so
mean1 = (X1(1)+X2(1))/2; mean2 = (X1(2)+X2(2))/2; mean3 = (X1(3)+X2(3))/2; %data at each timestep for X1 and X2 runs so average over both
mean4 = X1(4); mean5 = X1(5); %no X2 data for these timesteps so only averaging over X1 run
meanX = [mean1 mean2 mean3 mean4 mean5]
But obviously in a way that is scaleable without doing this process thousands of times using lots of for loops. In my actual code I have several thousand runs with each run having several hundred elements so this needs to be reasonably scaleable.
Thanks for any help people can offer and I'm obviously happy to try and clarify anything I have poorly explained

채택된 답변

Adam Danz
Adam Danz 2020년 12월 1일
편집: Adam Danz 2020년 12월 1일
I suggest collected all of the variable-length row-vectors within a cell array, then organize them in a matrix and use NaN to pad missing values. Then you can use the "omitnan" property of mean() to average across columns while ignoring NaNs.
Demo:
a{1} = [1 2 5];
a{2} = [5 1 3 5];
a{3} = [9 0 2 1 8];
a{4} = [4 2];
% Vertically concatenate, pad with NaNs
maxNumCol = max(cellfun(@(c) size(c,2), a)); % max number of columns
aMat = cell2mat(cellfun(@(c){padarray(c,[0,maxNumCol-size(c,2)],NaN,'Post')}, a)')
aMat = 4×5
1 2 5 NaN NaN 5 1 3 5 NaN 9 0 2 1 8 4 2 NaN NaN NaN
colMeans = mean(aMat,1,'omitnan')
colMeans = 1×5
4.7500 1.2500 3.3333 3.0000 8.0000
  댓글 수: 5
Ashfaq Ahmed
Ashfaq Ahmed 2023년 4월 4일
@Adam Danz this is a brilliant approach. Can you please help me to write the code as a function in a way that we only need to input the variables (of different lengths) and it will do the mean of them?
dpb
dpb 2023년 4월 4일
편집: dpb 2023년 4월 5일
What do you want the footprint of the function to be -- any number of vectors of variable length?
If so, then use varargin and you'll have the cell array automagically. All you'll have to do is ensure they're all oriented the same direction first; Adam's solution above assumes they're row vectors--
function colMeans=avgVecs(varargin)
a=varargin; % use Adam's internal variable; could change a-->varargin
% Vertically concatenate, pad with NaNs
maxNumCol = max(cellfun(@(c) size(c,2), a)); % max number of columns
aMat = cell2mat(cellfun(@(c){[c nan(1,maxNumCol-numel(c))]}, a)');
colMeans = mean(aMat,1,'omitnan');
end
Locally, the above with the same input vectors as separate variables
>> avgVecs(a,b,c,d)
ans =
4.7500 1.2500 3.3333 3.0000 8.0000
>>
I don't have Image Processing TB so replaced padarray with base MATLAB code.
In general, I wouldn't recommend going at it this way in creating the multiple named variables; it would be better to use a cell array initially and avoid the need to make the conversion entirely. In that case, you would simply pass in the cell array itself; varargin does the dirty work of creating a cell array out of multiple inputs when used in a function argument as shown. There's no equivalent neat syntax I'm aware of that does this directly at the command line or inside a script or function without the call to the lower-level function. You could, of course, simply have the oneliner function of
function varargout=vecs2cell(varargin)
varargout=varargin;
end
The output would be the 1x4 cell array; of course at this point they wouldn't be yet padded to common length, but that's what Adam's code expects as input.

댓글을 달려면 로그인하십시오.

추가 답변 (2개)

dpb
dpb 2020년 12월 1일
Use a cellarray to store the results of each trial instead of individual named variables; then
means=cellfun(@mean,x);
  댓글 수: 1
Adam Danz
Adam Danz 2020년 12월 1일
I think she's averaging between vectors, not within, based on mean1 = (X1(1)+X2(1))/2;

댓글을 달려면 로그인하십시오.


David Hill
David Hill 2020년 12월 1일
I would use a cell array.
for k=1:100
x{k}=randi(100,1,randi(1000));%simulate your outputs
end
Mean=zeros(1,100);
for k=1:100
Mean(k)=mean(x{k});%calculate the mean and whatever else you want
end

카테고리

Help CenterFile Exchange에서 Logical에 대해 자세히 알아보기

제품


릴리스

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by