I have A with 225 x 2 vectors. One Column is a variable always ranking from 1-5 (like grades) and the second is also numeric. I now want to calculate the mean, median, first and third quantile of the second vector, for each grade score.
The result I need, need to be interpreted like: mean(age) of A students better than mean(age) of B students
Grades 1 2 3 [etc]
Mean
Median
1st Qntl
3rd Qntl
I did it all by manually, which is kind of a lot, because I have 8 hypothesis for which the calculations are almost the same (the matrix A is in reality 225*11 but I only need 2-3 vectors per hypothesis). Now I wonder if there is a way to "do it faster and more efficient" namely in a for loop?
where I can write something like:
for i = 1:5
if ERM == i
mean_Hyp_1 = nanmean(A(ERM==1;:,2))
meadian_Hyp_1 = nanmedian(A(ERM==i;:,2)
etc
end
end
Thanks in advance

 채택된 답변

Vishwas
Vishwas 2017년 9월 19일

0 개 추천

You had the right idea. "find" function can be used to find all the rows where ERM == 1,2,.. in a loop and the result can be calculated.
Let me show this via an example:
a = [1;3;2;4;5;1;2;4;3;5;3;2;1]
b = [10;15;24;54;36;57;87;98;65;78;05;48;65]
input = [a b]
mean = []
median = []
for i = 1:5
mean(i) = nanmean(input(find(input(:,1)==i), 2))
median(i) = nanmedian(input(find(input(:,1)==i), 2))
end
I the case above, we are using the "find" function on the first column of input, extracting the indices for all values of input(:,1) == i and finding the mean of all the values from the second column.

댓글 수: 9

Vishwas,
I have a question about your answer. When I put your code in an m-file in my R2017a, the find parts have a red underline, telling me that
If 'input' is an indexed variable, performance can be increased using logical indexing instead of FIND.
If I click fix, the word find is removed (matching the answer I have given above).
Would one be better than the other in this example (and in general)?
It is better to avoid using "input" as a variable name, due to conflict with the frequently-used input() function.
Omitting the find() is more efficient.
Hello Tim and Vishwas,
thank you both for answering.
I have now tried both of your codes and I get an error because the vector B for vector A=1,2..etc doesn't always have the same size.
Subscripted assignment dimension mismatch.
Error in HypotheseEins (line 54)
mean_A(i) = nanmean(A(A(:,1)==i,2:end));
AND
Subscripted assignment dimension mismatch.
Error in HypotheseEins (line 49)
mean(i) = nanmean(A(find(A(:,1)==i), 2:end))
Please don't wonder about my slightly differences. I said above that I my matrix is actually bigger than I used in the example to ask my question. Do I have to first somehow "fill" the smaller vectors with zeros to the maxlength of the biggest vector?
Thank you very much.
Tim Berk
Tim Berk 2017년 9월 19일
편집: Tim Berk 2017년 9월 19일
mean_A(i) = nanmean(A(A(:,1)==i,2:end));
Takes the mean over multiple columns (2:end) as you seem to want it. But then it will also give multiple means (one for each column, as you want).
But you are still trying to put those multiple values into mean_A(i), which is a single location in the array mean_A.
Try
mean_A(i,:) = nanmean(A(A(:,1)==i,2:end));
Thank you very much! It works!
I changed it to display the grades as columns.
mean_A(:,i) = nanmean(A(A(:,1)==i,2:end));
My code looks more tidied up now and I can even put hypothesis together in one script.
I used display(mean_A) for the results to show in a "table" form. Do you by any chance know how I can name the rows and columns of the result?
Tim Berk
Tim Berk 2017년 9월 20일
Have a look at the function table ( https://www.mathworks.com/help/matlab/ref/table.html )
Stephen23
Stephen23 2017년 9월 20일
@Vishwas Vijaya Kumar: is there a good reason for shadowing the inbuilt input function?
José-Luis
José-Luis 2017년 9월 20일
편집: José-Luis 2017년 9월 20일
And mean().
And median().
And that's some pretty tortured indexing.

댓글을 달려면 로그인하십시오.

추가 답변 (1개)

Tim Berk
Tim Berk 2017년 9월 19일

1 개 추천

You can use the condition A(:,1) == i as indexing for which values in A(:,2) to consider, i.e.
A = [1 2 3 1 1 2 3; 4 5 6 7 8 9 0]'
for i = 1:3
mean_A(i) = nanmean(A(A(:,1)==i,2));
% etc..
end

댓글 수: 1

Hi Tim,
Thanks to you my codes for all my hypothesis are hapening so much faster. Now I am on my last hypothesis, which is the same method as before with one constraint. Before I open a new question, I just wanted to see, if you can help.
matrix A with 5 columns. First column with grades (1-5) and second column with years ranking from 2008-2013. Rest of columns again numeric.
First: "Cluster" the years 2008-2010, 2011-2013, 2014-2016
Second: Search Grades between the years 2008-2010, 2011-2013, 2014-2016
Third: Calculate the means of every column according to grade and clustered year.
The main problem I have encountered is that Matlab doesn't let me write the expression
for i = 2008:2010 ...etc
I did it again manually (mean of each year for all variables). But I cannot include, like your previous code showed me.
for i= 1:5
...(A(:,i)==i)..etc

댓글을 달려면 로그인하십시오.

카테고리

도움말 센터File Exchange에서 Creating and Concatenating Matrices에 대해 자세히 알아보기

질문:

2017년 9월 13일

댓글:

2017년 9월 21일

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by