Repeated Measure ANOVA 2x2

조회 수: 75 (최근 30일)
Sara Romanella
Sara Romanella 2021년 11월 27일
댓글: William Rose 2021년 12월 2일
Hi! I have 8 subjects whose scores were collected on two time points (pre vs. post), in various different conditions (for now I only add cond1 and cond2). The data is reported on a table like here below. I was hoping to write a code for a repeated measure ANOVA 2x2 to check time, condition, and time*condition. I am confused on how to do so. Could you please help me?
Thank you so much!
  댓글 수: 2
William Rose
William Rose 2021년 11월 27일
@Sara Romanella, are preCond1 and preCond2 repeated measures of the same thing, in each subject, before the intervention? Or are Cond1 and Cond2 different things?
William Rose
William Rose 2021년 11월 27일
@Sara Romanella, I think I misunderstood. All the measurements are of the same thing (for example, reaction time). Each subject is measured in two or more conditions (for example, three conditons could be off meds, on med.1, on med.2) and each condition includes a before and after (for example, before and after sleep).

댓글을 달려면 로그인하십시오.

답변 (5개)

Ive J
Ive J 2021년 11월 27일
편집: Ive J 2021년 11월 27일
You can follow this example:
% response was measured under two different conditions: c1 and c2 measured in two different time points
% t0 and t1.
data = table([1:8].', randn(8, 1), randn(8, 1), randn(8, 1), randn(8, 1), 'VariableNames', {'id', 'c1_t0', 'c1_t1', 'c2_t0', 'c2_t1'})
data = 8×5 table
id c1_t0 c1_t1 c2_t0 c2_t1 __ ________ ________ _________ ________ 1 -0.62332 -0.62717 0.44249 0.58444 2 0.88508 -1.0672 1.0175 -0.87148 3 0.38759 0.35989 -0.041598 -0.9803 4 -0.67442 0.2027 -0.74994 -1.127 5 -0.10083 0.087882 0.98915 0.73907 6 -0.95775 0.76095 -0.26919 0.058469 7 -0.54824 -0.04698 0.68703 -1.0912 8 0.1604 1.3812 -1.3043 0.19196
% since the response was measured for all individuals, it's part of
% within-subjects desing (same as 'time')
w = table(categorical([1 1 2 2].'), categorical([1 2 1 2].'), 'VariableNames', {'cond', 'time'}); % within-desing
disp(data)
id c1_t0 c1_t1 c2_t0 c2_t1 __ ________ ________ _________ ________ 1 -0.62332 -0.62717 0.44249 0.58444 2 0.88508 -1.0672 1.0175 -0.87148 3 0.38759 0.35989 -0.041598 -0.9803 4 -0.67442 0.2027 -0.74994 -1.127 5 -0.10083 0.087882 0.98915 0.73907 6 -0.95775 0.76095 -0.26919 0.058469 7 -0.54824 -0.04698 0.68703 -1.0912 8 0.1604 1.3812 -1.3043 0.19196
disp(w)
cond time ____ ____ 1 1 1 2 2 1 2 2
rm = fitrm(data, 'c2_t1-c1_t0 ~ 1', 'WithinDesign', w);
ranova(rm, 'withinmodel', 'cond*time')
ans = 8×8 table
SumSq DF MeanSq F pValue pValueGG pValueHF pValueLB ________ __ ________ ________ ________ ________ ________ ________ (Intercept) 0.14379 1 0.14379 0.4302 0.53285 0.53285 0.53285 0.53285 Error 2.3397 7 0.33425 (Intercept):cond 0.053187 1 0.053187 0.073368 0.79431 0.79431 0.79431 0.79431 Error(cond) 5.0746 7 0.72495 (Intercept):time 0.017321 1 0.017321 0.01667 0.9009 0.9009 0.9009 0.9009 Error(time) 7.2733 7 1.039 (Intercept):cond:time 1.0476 1 1.0476 5.2035 0.056543 0.056543 0.056543 0.056543 Error(cond:time) 1.4093 7 0.20133
I assume you've already checked the assumptions for repeated measures ANOVA, so I don't bother to go into details here :)
Also remember that mixed models are way more precise than repeated measures ANOVA (e.g. sphericity assumption may not hold in many real life cases). But that's up to you to decide how to conduct your statistical analyses.

the cyclist
the cyclist 2021년 11월 27일
I have experience building mixed effects models in MATLAB, but I have not used ANOVA (repeated measures or otherwise). Therefore, I cannot really give you any specific advice on how to build your model, but I can point you to this page in the documentation as a good starting point for what you need to do.
  댓글 수: 1
Sara Romanella
Sara Romanella 2021년 11월 27일
Thank you for answering! I did find that page, but I am having a hard time in putting all together and being able to use the right wilkinson notation for this specific analysis, so I was hoping someone had an idea on how to do it! Stll thank you though!

댓글을 달려면 로그인하십시오.


William Rose
William Rose 2021년 11월 28일
@Sara Romanella, don't have the Matlab yet but here it is in Excel.
The ANOVA section at bottom right of the image shows the results: "Sample" (Excel's term for the Pre/Post factor) is not significant, p=.19. "COlumns" (factor cond.1 vs. cond.2) is not significant, p=0.65. The interaction is not significant, p=0.79. I did this with Data > Data ANalysis > Two Factor ANOVA with Replication, and the chices I made within that tool are shown below:
You could add columns or rows if you have more conditions and more subjects, and re-do the analysis. You have to install the Analysis Tool Pak in Excel for the Data Analysis option to appear on the Data tab.
  댓글 수: 1
William Rose
William Rose 2021년 11월 28일
편집: William Rose 2021년 11월 28일
The analysis in Excel doesn't take into account that a specific set of measurements came from one person. It could have been 32 measurements in 32 different people, 8 of whom possesd each combination of possible factors (2x2 factors). You can prove this by shuffling the numbers vertically within each column, independently for each column. The p-values in Excel are not affected by such shuffling. I don't like the insensitivity to shuffling, because it means each person is not serving as their own control.

댓글을 달려면 로그인하십시오.


William Rose
William Rose 2021년 11월 28일
편집: William Rose 2021년 11월 28일
Here is a Matlab solution. I put your data into a text file, data.txt (attached). The script reads the text file, and rearranges it into a table with the structure Matlab likes: 32 rows by 3 columns. Table column 1= factor 1 label (Cond1 or Cond 2). Table column 2 = factor 2 label (Pre or Post). Table column 3 = numeric measurement. The ordering of the columns is not important, as long as each column has a name. What's important is that there is one row for each measurement, and there is a column for the factor 1 value associated with each measurement, and a column indicating the factor 2 value associated with each measurement. Here is the final bit of the code. Read the rest of it in the attached script.
If you add rows, the script should still work fine, without modification.
If you add measurements under more conditions (i.e. two more columns, Pre and Post, for each condition), the script should still work fine, as long as you add a factor label for each condition to the list of factor labels in factor1label{}.
%combine the transposed rows plus column of measurements into a table
T=table(factor1',factor2',meas,'VariableNames',{'f1','f2','m'});
%specify the statistical model
%Wilkinson notation: 'fac1*fac2' is equivalent to 'fac1,fac2,fac1*fac2'
rm=fitrm(T,'m ~ f1*f2');
%do the analysis of variance
anova(rm)
The p-values it produces match th p-values from Excel.
I tried shuffling the data, by randomly re-arranging the values within each column of 8, independently for each column. This shuffling does not alter the mean or variance within each column of 8 numbers. See attached text file with the shuffled data. When you alter the script so that it reads the shuffled data file, the p values are unchanged. Excel's 2-factor ANOVA with repeated measures does the same thing. This tells us that the 2-factor with repeated measures does not take into account the fact that the data on each row are all from the sme person. It does not allow each person to serve as their own control.
If I want each subject to b their own control, maybe I need to do a three factor anova, without repeated measures. The third factor would be the subject ID.

William Rose
William Rose 2021년 11월 29일
The attached script does a two-way ANOVA with repeated measures, as before, and it does a 3-way ANOVA (without repeated measures) of the same data. The reason to do 3-way is that by including subject ID as a potential factor, you allow each subject to be their own control. More technically, the 3-way ANOVA includes a test of hypothesis : the means of all the subjects are the same, versus : the means of the subjects are not all the same. If the means really are different for different subjects, and you don't account for it, then the inter-subject variability makes it harder to detect possibly significant effects of factor 1 or factor 2. When you do two-way with repeated measures, you are not allowing for possible inter-subject variability. But with 3-way (and no repeated measures), you do allow for possible inter-subject variability, which can help you detect other factor effects.
The data you have provided is a good example of this. With 2-way repeated measures analysis, neither Condition nor Time (Pre/Post) nor the interaction is significant. Not even close: p=0.65, 0.19, 0.79 respectively. But with three-way, the Time, subject ID, and their interaction are significant: p=0.01, 0.0007, 0.04 respectively. See screenshots below.
We can demonstrate the benefit of 3-way versus (2-way with R.M.), and we can test our claims about how they work, by shuffling the data among subjects. I described this in myprevious post. The ANOVA results for 2-way with RM are not affected at all by shuffling the results among subjects, within each combination of factors 1 and 2. If there really is a significant effect of subject, as the 3-way analysis leads us to believe, then shuffling should destroy that effect in the 3-way results. And it does: three-way ANOVA on the shuffled data indicates that neither Cond nor Time nor ID nor their pairwise interactions are significant.
Attached:
  • script that does 2-way with repeated measures ANOVA and 3-way ANOVA.
  • text data file which the script reads
  • shuffled text data file for comparison
Screenshots:
  • Console output created by the script
  • Figure created by the script
  댓글 수: 2
Jeff Miller
Jeff Miller 2021년 11월 30일
The three-way ANOVA in this answer breaks down the mean squares correctly, but the F's and p's are not correct. For example, the correct F for time is 18.3921 / 6.8475, not 11.21.
This three-way ANOVA uses the Time*Cond*ID interaction as the error term for the computation of all F's, which is not right for this repeated measures design. The correct error term for each source is the source*ID interaction term; for example, Cond*ID is the error term when computing the F for Cond, and there is no F for ID.
The correct analysis for this problem is the one given by Ive J.
William Rose
William Rose 2021년 12월 2일
Thank you Jeff Miller.

댓글을 달려면 로그인하십시오.

카테고리

Help CenterFile Exchange에서 Analysis of Variance and Covariance에 대해 자세히 알아보기

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by