Which statistic should be used to analyze differences between groups in a repeated-measures design?

Linear and Non-Linear Regression Methods in Epidemiology and Biostatistics

Eric Vittinghoff, ... Stephen C. Shiboski, in Essential Statistical Methods for Medical Statistics, 2011

2.5.4 Repeated measures ANOVA

Correlated data analyses can sometimes be handled by repeated measures analysis of variance (ANOVA). When the data are balanced and appropriate for ANOVA, statistics with exact null hypothesis distributions (as opposed to asymptotic, likelihood based) are available for testing. However, the variance–covariance structure is typically estimated by the method of moments, which may be less efficient than maximum likelihood. For unbalanced data, tests are approximate, and, even though approximations have been developed (e.g., the Geisser–Greenhouse correction; Greenhouse and Geisser, 1959), may not achieve nominal significance levels. Also, in the specification of approximate F-statistics, it is not always straightforward to specify a denominator mean square (i.e., what is the “right” error term).

Maximum likelihood estimation generates test statistics relatively automatically and gives better predictions of the random effects. Maximum likelihood methods also generalize naturally to non-normally distributed outcomes (see, e. g., McCulloch and Searle, 2000), unlike repeated measures ANOVA. See McCulloch (2005) for further discussion.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444537379500062

Hierarchical models for EEG and MEG

S. Kiebel, ... K. Friston, in Statistical Parametric Mapping, 2007

Evoked responses

A typical analysis of ERRs is the one-way (repeated measures) analysis of variance. For each of the n = 1, …, N subjects there are K measurements (i.e. trial-types). The second-level summary statistics are contrasts over peristimulus time, for each subject and trial type. The design matrix in this case is X(2) = [IK ⊗ 1N, 1K ⊗ IN] (see also Chapter 13). In terms of the model's covariance components, one could assume that the between-subject errors ε(2) are uncorrelated and have unequal variances for each trial-type. This results in K covariance components. After parameter estimation, one tests for main effects or interactions among the trial-types at the between-subject level, with the appropriate contrast. Note that this model uses a ‘pooled’ estimate of non-sphericity over voxels (see Chapter 13). An alternative is to compute the relevant contrasts (e.g. a main effect or interaction) at the first level and use a series of one-sample t-tests at the second.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123725608500164

Multi-Factor ANOVA and ANCOVA

R.H. Riffenburgh, in Statistics in Medicine (Third Edition), 2012

Data and Means

The data are shown in Table 13.13 (means symbols will be used later):

Table 13.13. Cardiac Experiment Data

Patient NumberReplication NumberTime 1Time 2Time 3Means (mi.k)
1 1 80 84 78 80.6667
1 2 77 74 78 76.3333
1 3 71 81 75 75.6667
2 1 50 57 60 55.6667
2 2 50 55 73 59.3333
2 3 48 55 56 53.0000

We will need an additional set of means that does not appear in a two-way means table: those across the repeated measure (time) for each replication, which appear as the last column in the Table 13.13. These “among-times” means will allow a within-times sum of squares to be calculated. A means table of the sort seen for two-factor ANOVA is shown in Table 13.14.

Table 13.14. Means Table for Cardiac Experiment

Patient NumberTime 1Time 2Time 3Across Times
1 76.0000 79.6667 77.0000 77.5556 = m1..
2 49.3333 55.6667 63.0000 56.0000 = m2..
Across patients 62.6667 = m.1. 67.6667 = m.2. 70.0000 = m.3. 66.7778 = m…

The calculations appear after some needed formulas are presented.

Method for Repeated Measures ANOVA

Goal

As with other multifactor ANOVAs, the goal is to obtain mean squares and then F values for the main and interaction effects. However, we have one set of effects with variability caused by the treatment factor plus a case-to-case (e.g. patient-to-patient) factor and another set having variability without case-to-case differences, because it is repeated with each case. For example, suppose we are comparing mean pain levels reported by patients randomized to two types of anesthesia immediately post-op and 4 hours post-op. Pain readings between anesthetics contain patient-to-patient differences, but pain readings between 0 and 4 hours post-op do not. Thus, to calculate F, we need one mean square for error that includes random variability influences of measures not repeated, and one that includes random variability influences of the repeated-measure-to-measure influences.

Two SSE Terms

The sums of squares for the main effects and interaction are the same as for a two-factor ANOVA, seen in Table 13.5. The difference, so far as calculation is concerned, is finding two sums-of-squares-due-to-error (SSE) terms. One is a within-repeated-measures sum of squares due to error, or SSE(W), that estimates error variability for causal terms involving the repeated factor. In the preceding example, this SSE(W) contains the influence of measures over the same patient. The other is a between-repeated-measures sum of squares due to error, or SSE(B), that estimates error variability for causal terms not involving the repeated factor, that is, contains the influence of the independent, or non-repeated, factor.

Row and Column Designations

The references to row and column in Section 13.2 must be clarified. In order to be consistent with many software packages, the means table will present repeated measures across the rows and independent measures down the columns. Thus, sum of squares for rows (SSR) will denote the SS for the repeated measure, and sum of squares for columns (SSC) will denote the SS for the independent measure.

Similar to the preceding section on two-factor ANOVA, c denotes the number of repeated columns (repeated measures), r the number of independent measures, and w the number of replications. Subscript designations appear in Table 13.15.

Table 13.15. Subscript Designations

i=1, 2, …, r
j=1, 2, …, c
k = 1, 2, …, w.
mij.=∑kwxijk/w

Sum of Squares for Error Calculations

The first step is to find an interim “sum of squares across repeated measures”, or SSAcross. This SS is the sum of squares for the means of each replication across the repeated measures, as shown in the rightmost column of the data table, mi.k, as in Table 13.16. As before, A is n… times the squared sum of all observations in the experiment, or alternatively rcw × m…2. Table 13.5 gave formulas for calculating SST, SSR, SSC, and SSI. Table 13.16 supplements Table 13.5 with the additional formulas required for repeated measures ANOVA.

Table 13.16. Formulas Supplemental to Those in Table 13.5 for Components in a Repeated Measures ANOVA

SSAcross=c∑jcmi⋅k2-A
SSE(B) = SSAcross − SSC
SSE(W) = SST − SSAcross − SSR − SSI

SSAcross, sum of squares across repeated measures; SSC, sum of squares for columns; SSE(B), between-repeated-measures sum of squares for error; SSE(W) within-repeated-measures sum of squares for error; SSI, sum of squares for interaction and row-by-column; SSR, sum of squares for rows; SST, sum of squares for the total.

Analysis of Variance Table

Table 13.17 provides the repeated-measures (two-factor) ANOVA table.

Table 13.17. Repeated Measures (Two-factor) Analysis of Variance Table

SourceSums of SquaresdfMean SquaresFp
Independent groups SSR (Table 13.4) r − 1 MSR = SSR/df MSR/MSE(B)
Error between SSE(B) (Table 13.16) r(w − 1) MSE(B) = SSE(B)/df
Repeated measures SSC (Table 13.4) c − 1 MSC = SSC/df MSC/MSE(W)
Interaction SSI (Table 13.4) (r − 1)(c − 1) MSI = SSI/df MSI/MSE(W)
Error within SSE(W) (Table 13.16) r(c − 1)(w − 1) MSE(W) = SSE(W)/df
Total SST (Table 13.4) rcw − 1

df, degrees of freedom; MSC, mean square of columns; MSE(B), between-repeated-measures mean square of error; MSE(W), within-repeated-measures mean square of error; MSI, mean square for interaction (row-by-column); MSR, mean square for rows; MSC, mean square for columns; SSC, sum of squares for columns; SSE(B), between-repeated-measures sum of squares for error; SSE(W), within-repeated-measures sum of squares for error; SSI, sum of squares for interaction (row-by-column); SSR, sum of squares for rows; SST, sum of squares for the total.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123848642000135

Traditional methods of longitudinal data analysis

Xian Liu, in Methods and Applications of Longitudinal Data Analysis, 2016

2.3.4 Empirical illustration: a two-factor repeated measures MANOVA on the effectiveness of acupuncture treatment on two psychiatric disorders

In Section 2.2.4, an empirical example was provided on how to apply repeated measures ANOVA for analyzing the effect of acupuncture treatment on the PCL score and its changing pattern over time. In the present illustration, I consider an additional response variable – the Beck Depression Inventory-II score, or BDI-II – with a doubly multivariate repeated measures design. The BDI-II is a psychometrically sound 21-item self-report measurement, with value ranging from 0 to 63. A higher BDI-II score indicates enhanced severity of depression. This second dependent variable is named BDI_SUM in the analysis. As the longitudinal data now involves multivariate repeated measures for two dependent variables, PCL_SUM and BDI_SUM, a two-factor repeated measures MANOVA model is created, with the independent factors still being TIME and TREAT (TIME: 0 = baseline survey, 1 = 4-week follow-up, 2 = 8-week follow-up, 3 = 12-week follow-up; TREAT: 1 = receiving acupuncture treatment, 0 = else). The MANOVA analysis is intended to test the null hypotheses that both PCL_SUM and BDI_SUM do not change over time and neither do they differ between the two treatment groups. It is also assumed that there is no interactive effect on repeated measurements of both dependent variables between TIME and TREAT, and that there is no subject’s effect given the specification of two covariates.

As BDI_SUM is considered as an additional dependent variable, its repeated measurements at four time points need to be included in the creation of the temporary dataset TP2. Below is the SAS program for this step.

SAS Program 2.5a:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

Given the recreation of TP2, the MANOVA analysis is then conducted with two dependent variables measured at four time points for each subject. The SAS PROC GLM procedure with the REPEATED statement is used again. The following program displays the detailed statements.

SAS Program 2.5b:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

In SAS Program 2.5b, the MODEL statement specifies two sets of multivariate repeated measures as dependent variables, PCL_SUM0 − PCL_SUM3 and BDI_SUM0 − BDI_SUM3. On the right of the equation mark, only TREAT is explicitly given as a covariate. As specified in repeated measures MANOVA, the effect of TIME is reflected in the main effects of the repeated measurements, and likewise, the interaction between TIME and TREAT is summarized by variations of TIME’s effect between two treatment groups. In the REPEATED statement, the option RESPONSE 2 tells SAS that there are two response variables. The IDENTITY option is applied to generate an identity transformation corresponding to the associated factor. Similarly, the option TIME 4 specifies the number of repeated measurements for each dependent variable. The SAS PROC GLM procedure also has the capability to test statistical significance for specified contrasts in repeated measures MANOVA; such a testing step will be described later when linear mixed models are described.

SAS Program 2.5b derives multivariate tests for the main effects of TIME and TREAT and their interactions across responses. The overall information of the doubly multivariate repeated measures design is displayed first, as shown below.

SAS Program Output 2.2a:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

SAS Program Output 2.2a displays that the factor TREAT has two levels with the values 0 or 1. More than half of the observations are not included in the analysis because MANOVA removes all subjects with missing observations (the problems arising from this removal will be discussed in the Section 2.4).

Next, the repeated measures level information is displayed.

SAS Program Output 2.2b:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

The above table presents that the response variable is 1 for the PCL_SUM repeated measurements and 2 for the BDI_SUM scores. As indicated earlier, there are four levels for TIME. The multivariate tests for the overall effect of acupuncture treatment across the two responses are presented next, referred to as Response*Treat Effect in the MANOVA table.

SAS Program Output 2.2c:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

From SAS Program Output 2.2c, the main effect of acupuncture treatment is marginally significant across the two responses. Because an interaction between TREAT and TIME is specified, the statistical significance of the main effect for acupuncture treatment will be further assessed after checking the significance of that interaction. If an underlying factor only includes two levels, like the case of TREAT, the four multivariate test statistics take exactly the same F-value, thereby generating the identical conclusion about the test.

Below I display the results of the multivariate tests for the overall time effect across two responses, referred to as Response*Time Effect in the MANOVA table.

SAS Program Output 2.2d:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

As shown above, the time effect is statistically significant across the two responses with a p-value that is lower than 0.0001. Again, all four multivariate test statistics are identical given two levels of the treatment factor.

The multivariate test results for the TREAT-by-TIME interaction across two responses, referred to in the MANOVA table as Response*Treat*Time test, are presented as follows.

SAS Program Output 2.2e:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

As illustrated, the overall effect of the TREAT-by-TIME interaction is statistically significant across the two responses, with p-value below 0.01. Given the statistical significance of the interaction term, the main effect of acupuncture treatment should therefore be regarded as statistically significant, although its p-value is greater than 0.05.

The multivariate test results for within-subjects effect, referred to as the Response effect in the MANOVA table, are reported as follows.

SAS Program Output 2.2f:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

Thus, within-subject random errors are also statistically significant across the two responses. The p-value associated with the residuals is below 0.0001.

Finally, SAS Program 2.5b generates an ANOVA-type table that reports the results of hypothesis test of the between-subjects effects, given below.

SAS Program Output 2.2g:

Which statistic should be used to analyze differences between groups in a repeated-measures design?

The above table indicates that after including the factors TREAT and TIME, there is only a fairly minor subject effect across the two responses as the associated p-value is slightly above 0.05.

SAS Program 2.5b does not generate a transformation matrix P. If the detailed values of this matrix are needed, the researcher should use a MANOVA statement and specify the SUMMARY option.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128013427000022

Analysis of Variance

W. Penny, R. Henson, in Statistical Parametric Mapping, 2007

Pooled versus partitioned errors

In the above model, e is sometimes called a pooled error, since it does not distinguish between different sources of error for each experimental effect. This is in contrast to an alternative model in which the original residual error e is split into three terms eAnq, eBnr and eABnqr, each specific to a main effect or interaction. This is a different form of variance partitioning. Each error term is a random variable and is equivalent to the interaction between that effect and the subject variable.

The F-test for, say, the main effect of factor A is then:

where SSk is the sum of squares for the effect, SSnk is the sum of squares for the interaction of that effect with subjects, DFk = K1 – 1 and DFnk = N(K1 – 1).

Note that, if there are no more than two levels of every factor in an M-way repeated measures ANOVA (i.e., Km = 2 for all m = 1… M), then the covariance of the errors Σe for each effect is a 2-by-2 matrix which necessarily has compound symmetry, and so there is no need for a non-sphericity correction.1 A heuristic for this is that there is only one difference q = 1 between two levels Km = 2. This is not necessarily the case if a pooled error is used, as in Eqn. 13.15.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123725608500139

Introduction

Xian Liu, in Methods and Applications of Longitudinal Data Analysis, 2016

1.3.1 Multivariate data structure

The classical repeated measures data are predominantly used in the ANOVA in experimental studies. Traditionally, the data structure for repeated measures ANOVA follows a multivariate format. In this data structure, each subject only has a single row of data, with repeated measurements being recorded horizontally. That is, a column is assigned to the measurement at each time point in the data matrix. To illustrate the multivariate data structure, I provide an example by using the repeated measures data of the Randomized Controlled Clinical Trial on the Effectiveness of Acupuncture Treatments on PTSD, which will be described extensively in Section 1.7 (PTSD is the abbreviation of posttraumatic stress disorder). The PTSD Checklist (PCL) score is the response variable to gauge severity of PTSD symptoms, a 17-item summary scale measured at four time points. The value range of the PCL score is from 17 to 85. In the multivariate data format, the repeated measurements for each subject are specified as four outcome variables lined in the same row, with time points indicated as suffixes attached to the variable name. Additionally, two covariates are included in the dataset: Age and Female (male = 0, female = 1). To identify the subject for further analysis, each individual’s ID number is also incorporated. Below is the data matrix for the first five subjects in the multivariate data format.

In Table 1.1, each subject has one row of data with four outcome variables, PCL1–PCL4, the ID number, and the two covariates, Age and Female. Among the five subjects, one person is aged below 30 years, one above 50, and the rest ranging between 38 and 44 years of age. There are four men and one woman. As all observations for the outcome variable are lined horizontally in the same row, the multivariate data structure of repeated measurements contains additional columns, therefore also referred to as the wide table format. Clearly, the cross-sectional data format is a special case of the multivariate structure with the outcome variable being observed only at one time. The most distinctive advantage of using the multivariate data structure is that each subject’s empirical growth record can be visually examined (Singer and Willett, 2003). In Table 1.1, for example, it is easy to summarize each subject’s trajectory by comparing values of the repeated measurements horizontally. Further examination on the pattern of change over time in the response variable can be performed visually. Perhaps due to this convenience, various latent growth models, which constitute an integral part of the literature on longitudinal data analysis, are designed with such a wide table perspective.

Table 1.1. Multivariate Data of Repeated Measurements

IDPCL1PCL2PCL3PCL4AgeFemale
1 66 31 58 39 27 0
2 48 56 43 43 44 1
3 37 50 53 47 38 0
4 41 23 21 21 53 0
5 51 57 39 46 44 0

There are, however, distinctive disadvantages for the multivariate data structure in performing longitudinal data analysis. First, time is the primary covariate in analyzing the pattern of change over time in the response variable. In the wide table format, the time factor is indirectly reflected by the suffix attached to each time point, and therefore, time is not explicitly specified as an independent factor, thereby bringing inconvenience in the analysis of the time effect. Sometimes, intervals between two successive waves are unequally spaced by design or vary across subjects, and the multivariate data structure obviously cannot reflect such variations in spacing. Second, in longitudinal data analysis values of some covariates may vary over time, and failure to address such time-varying nature in the predictor variables can result in bias in analytic results and erroneous predictions of longitudinal processes. There are some complex, cumbersome ways to specify time-varying covariates within the multivariate data framework; these approaches, however, are not user-friendly and are inconvenient to apply.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128013427000010

Multifactor Tests on Means of Continuous Data

ROBERT H. RIFFENBURGH, in Statistics in Medicine (Second Edition), 2006

Analysis of Variance Table

Substitution in the formulas of Table 18.5 yields the df, mean squares, and F-values appearing in the ANOVA table.

Repeated-Measures Analysis of Variance for Rattlesnake Venom Experiment

EffectSums of squaresdfMean squaresFCritical Fp
Antivenin treatment 400.1868 (SSC) 1 400.1868 1.142 4.96 0.310
Between-pig error 3502.6493 [SSE(B)] 10 350.2650
Fasciotomy 485.9746 (SSR) 1 485.9746 2.294 4.96 0.161
Treatment × Fasciotomy interaction 0.6607 (SSI) 1 0.6607 0.003 4.96 0.956
Within-pig error 2118.3647 [SSE(W)] 10 211.8349
Total 6507.8361 (SST) 23

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780120887705500587

Statistical Modeling in Biomedical Research: Longitudinal Data Analysis

Chengjie Xiong, ... J. Philip Miller, in Essential Statistical Methods for Medical Statistics, 2011

3 Design issues of a longitudinal study

In this section we focus on the response variables which are of continuous type, although the case when the longitudinally measured response variable is binary or ordinal can be worked out in a similar fashion.

As stated earlier, the major objective of a longitudinal study is to study the rate of change over time on response variables. There are different designs that can be used when planning a longitudinal study. The determination of sample sizes and the corresponding statistical powers are some of the most important issues when designing a longitudinal study. The answers to these questions depend on several factors: the primary hypotheses/objectives of the study, the statistical models used for analyzing the longitudinal data, the significance level of the primary statistical test or the confidence level of the confidence interval estimate to the rate of change over time, the statistical power desired for a statistical test, or the degree of accuracy in the confidence interval estimate to the rate of change. Most of times, analysis of response profiles, repeated measures analysis of variance, and the general linear mixed models are the major statistical models used for determining the sample sizes of longitudinal studies when the primary outcome variable is of continuous type.

When no parametric forms are assumed for the mean response profiles which are estimated and compared based on the analysis of response profiles or the repeated measures analysis of variance, the methods of sample size determination can be based on the standard analysis of response profiles and repeated measures analysis of variance. In a longitudinal study to compare multiple treatment groups over time, if repeated measures analysis of variance is used under the assumption that the covariance matrices of the measurement errors of the time intervals and the error terms of the subjects assigned to a given study conditions satisfy the H–F condition (Huynh and Feldt, 1970), the sample size determination can be further based on the F-tests or t-tests from a standard two-way analysis of variance (Chow and Liu, 2003) based on appropriate statistical tests on the primary hypothesis of the study. We consider here several types of longitudinal studies which are analyzed by the general linear mixed effects models in which a linear growth curve over time is assumed, one is to estimate the rate of change over time, and the other is to compare two subject groups on the rate of change over time.

Case 1

Estimating a single rate of change over time.

The simplest longitudinal study design is an observational study for which study subjects are followed for a certain period of time. This type of longitudinal study can be used to estimate the rate of change for the outcome variable over a certain time period. In many of these observational studies, the most important objective is to achieve an accurate estimate to the rate of change over time on some important measures for a population of subjects. Suppose that a sample of size n will be used in the study for which each subject is planned to take k repeated measures of the response variable at time points t1,t2,…tk. Let Yj = (yj1,yj2,…,yjk)t be the vector of longitudinal measurements of the jth subject. For simplicity, we assume that changes in the mean response can be modeled by a linear trend over time and therefore the slope over time can be used to describe the rate of change. The major objective here is to obtain an accurate confidence interval estimate to the mean slope over time for the population of subjects under study. Recall that the two-stage random effects model assumes an individual growth curve for each subject at Stage 1

Yji=β0j+β1jti+eji,

where eji's are assumed to be independent and identically distributed as a normal distribution with mean 0 and variance σe2 At Stage 2, the subject-specific rates of change βj2's are assumed to follow another normal distribution with mean β1 and variance σb2 and are independent of eji's (the distribution of β0j need not be used here). The major interest is in the estimation of mean change of rate β1 in the population. The simple least square estimate to the subject-specific rate of change for the jth subject is

βˆ1j= ∑i=1k(ti−t¯)Yji∑i=1k(ti−t¯ )2,

where t¯=∑i=1kti /k Notice that βˆ1j follows a normal distribution with mean β1 and variance σ2, where

σ2=σe2{∑i=1k(ti −t¯)2}−1+σb2.

Therefore a 100(1−α)% (0 < α < 1) confidence interval for β1 based on a sample of size n is β¯1±zα/2(σ/n ) where

β¯1=∑j=1nβˆ1jn.

This gives the sample size required for achieving a confidence interval estimate of β1 with a margin of error ± δ as

n= (zα/2σ)2δ2.

If the longitudinal study is unbalanced or incomplete in which different study subjects may have different design vectors of times or even different number of time points, similar sample size formula could be derived under certain convergence assumptions on the design vectors of times.

Case 2

Estimating the difference of two rates of change over time.

A comparative longitudinal study compares the longitudinal courses of one or more response variables over two or more techniques, treatments, or levels of a covariate. In many clinical trials that evaluate the efficacy of one or more therapeutic treatments for a disease such as AD, a comparative longitudinal design is likely used to compare the treatments with placebo on the rate of change over time for a primary endpoint. Here we consider estimating the difference on the rates of change for the primary endpoint between the treated group and the placebo. The random coefficients model in this case assumes that the subject-specific slope β1j follows a normal distribution with mean βt and variance σbt2 when the subject belongs to the treated group and another normal distribution with mean βc and variance σbc2 when the subject belongs to the control group. Similar to Case 1, when the subject belongs to the treated group, βˆ1j follows a normal distribution with mean βt and variance σt2 where

σt2=σe2{∑i=1k(ti−t¯)2}−1+σbt2.

When the subject belongs to the control group, βˆ1j follows another normal distribution with mean βc and variance σc2 where

σc2=σe2{∑i=1k(ti−t¯)2}−1+σbc2.

Therefore a 100(1−α)% (0 < α < 1) confidence interval for the difference βt – βc on the mean rates of change over time between the treated group and the control group is β¯t−β¯c± zα/2(σt2/nt)+(σc2/nc) where

β¯i=∑j=1niβˆ1jni

for i = t, c, and nt, nc are the sample size for the treated group and the control group, respectively. Let λ = nt/nc be the sample size ratio between two subject groups. This confidence interval also yields the sample sizes for the two study groups required for achieving a confidence interval estimate of βt – βc with a margin of error ±δ as

nc=(σt2λ+σc2 )(zα/2δ)2,

and nt = λnc.

Case 3

Testing a hypothesis on the difference of two rates of change over time.

Along the similar arguments made in Case 2, the test statistic for testing H0 : βt = βc against Ha : βt – βc = Δ ≠ 0 is

z= β¯t−β¯c(σt2 /nt)+(σc2/nc).

The test statistic follows a standard normal distribution when the null hypothesis is true. The test therefore rejects the null hypothesis when |z| > zα/2 at a significance level of α (0 < α < 1). The power of the test, as a function of Δ is given by

P(Δ)=1−Φ (zα/2−Δ(σt2/nt)+(σc2/nc))+Φ(−zα/2−Δ(σt2/nt )+(σc2/nc)).

Therefore, the sample sizes required to achieve a statistical power of (1−γ)(0 < γ < 1) is the solution to nt and nc such that

P(Δ)=1−γ.

Notice that in all these sample size formulas, the length of the study, the number of repeated measures on the response variable, and the time spacing of the repeated measures all impact the statistical power through the quantity

f(t1,t2,…,tk)=∑i=1k(ti−t¯) 2.

Because this quantity is inversely related to the variance of the estimated subject-specific rate of change over time, the larger the quantity is, the smaller the variance for the estimated subject-specific slope is, the more accurate the confidence interval estimates to the mean slopes are, and the more powerful the statistical test is for comparing the two mean rates of changes over time between the treated group and the control group. Therefore, an optimal design should in theory maximize the quantity f,(t1,t2,…, tk) over the choice of k,t1,t2,…, tk Notice that tk – t1 is the entire duration of the study. Although theoretically it should be chosen to maximize f,(t1,t2,…, tk many economic and logistic and subject matters factors constrain the choice of tk – t1. In addition, the validity of the assumed statistical model also constrains the choice of tk – t1 in the sense that a linear growth over time might not be a reasonable assumption with a very long study duration, which is especially the case in the study of cognitive decline in Alzheimer's patients. Similarly, the number of repeated measures in a longitudinal study might also be constrained by many practical factors and cannot be freely chosen by the designers of the study. As a result, many longitudinal studies are restricted to relatively short duration with a predetermined number of repeated measures which is not chosen statistically based on an optimal design. Given that tk – t1 and k are typically chosen by some non-statistical reasons, the optimal design now relies on the choice of time spacing to maximize f,(t1,t2,…, tk It can be mathematically proved that with an even k, f,(t1,t2,…, tk is maximized when k/2 observations are taken at baseline t1 and the other k/2 taken at the final time point tk for each study subject. This mathematically optimal design, however, is not only impractical in many longitudinal studies but also completely erases the ability of verifying the validity of the linear growth curve based on the collected data. Therefore optimal longitudinal designs are sometimes based on further assumptions on the spacing of design vector of times. For example, if the researchers would want to design an equally spaced longitudinal study, then

f(t1,t2,…,tk)=(tk−t1)2k(k+1)12(k−1).

This function indicates the relevant influence of tk – t1 and k on the sample size computations. In general, if the linear growth curve is a valid statistical model and that the logistic and practical factors allow, an increase of either the study duration or the frequency of repeated measures will decrease the within-subject variability and improve the precision of parameter estimates or the statistical power in the test on the rate of change over time.

Missing data almost always happen in longitudinal studies. In general, the impact of missing data on sample size determination is difficult to quantify precisely because of the complexity in the patterns of missingness. The simplest conservative approach to account for the missing data in sample size determination is to first compute the sample sizes required assuming all subjects have the complete data, and then adjust the sample sizes based on an estimated rate of attrition accordingly.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780444537379500116

Design of Experiments

Donna L. Mohr, ... Rudolf J. Freund, in Statistical Methods (Fourth Edition), 2022

10.5.3 Assumptions of the Repeated Measures Model

Repeated measures are particularly prone to a failure of the independence assumption for errors within the same subject. It is extremely difficult to diagnose this, because the residuals are already heavily correlated due to the estimation procedure. To understand this, remember that we do not observe the error terms, but rather the residuals, y−fitted value. The residuals are highly correlated in blocked designs, because of the constraints that they sum to 0 in a number of ways. For example, the residuals within any subject will sum to 0 because we have fit a parameter for a subject effect. In an extreme case, if there are only two treatments per subject, knowing one of the residuals determines what the other residual will be. This correlation in the residuals is a product of the estimation procedure, and it will often mask any correlation in the unobservable error terms.

Fortunately, the repeated measures analysis can withstand some types of dependence in the errors. What is actually required is that the differences between any two measurements within the same subject all have the same variance. The precise statement of this requirement, sometimes called a circularity or sphericity assumption, is given in Huynh and Feldt (1970) and discussed in detail in Winer et al. (1991). The sphericity assumption plays the same role in repeated measures analysis that the assumption of constant variance and independence plays in ordinary ANOVA.

There is a formal test of the sphericity assumption based on an examination of the covariance matrix of the observations rather than the residuals, Mauchley’s Test, see Winer et al., 1991, but it is not very powerful in small samples. If the sphericity assumption fails, there are several methods for adjusting the numerator and denominator degrees of freedom. The most popular are the Greenhouse-Geisser and Huynh-Feldt adjustments Winer et al. (1991). These are analogous to Satterthwaite’s adjustment to the degrees of freedom of the independent samples t test when the variances are unequal (Section 5.2). Most statistical software will automatically give the information on these adjustments whenever a repeated measures analysis is specified, as in Table 10.15. Some authors recommend that the p values for F tests be adjusted whenever these epsilon values, a measure of the departure from sphericity, drop below the neighborhood of 0.75. Other authors recommend simply always using one of the adjustments. The adjustments consist of multiplying the numerator and denominator degrees of freedom by epsilon. The Huynh-Feldt epsilon sometimes exceeds 1.0, in which case no adjustment is made. Adjusted p values are computed using these new degrees of freedom. They are automatically produced by the SAS System, as we can see in Table 10.15.

Note that sphericity and lack of independence do not play a role in the between-subjects tests. Those are effectively carried out on the average per subject. Dependencies in errors within subjects are irrelevant.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780128230435000102

Covariance Components

D. Glaser, K. Friston, in Statistical Parametric Mapping, 2007

Correcting degrees of freedom: the Satterthwaite approximation

Box's motivation for using this measure for the departure from sphericity was to harness an approximation due to Satterthwaite. This deals with the fact that the actual distribution of the variance estimator is not χ2 if the errors are not spherical, and thus the F-statistic used for hypothesis testing is inaccurate. The solution adopted is to approximate the true distribution with a moment-matched scaled χ2 distribution – matching the first and second moments. Under this approximation, in the context of repeated measures ANOVA with k measures and n subjects, the F-statistic is distributed as F [(k – l)ε, (n – 1)(k – 1)ε]. To understand the elegance of this approach, note that, as shown above, when the sphericity assumptions underlying the model are met, ε = 1 and the F-distribution is then just F [(k – 1), (n – 1) (k – 1)], the standard degrees of freedom. In short, the correction ‘vanishes’ when not needed.

Finally, we note that this approximation has been adopted for neuroimaging data in SPM. Consider the expression for the effective degrees of freedom from Worsley and Friston (1995):

Compare this with Eqn. 10.7 above, and see Chapter 8 for a derivation. Here R is the model's residual forming matrix and V are the serial correlations in the errors. In the present context: Σx = RVR. If we remember that the conventional degrees of freedom for the t-statistic are k – 1 and consider ε as a correction for the degrees of freedom, then:

10.13v=(k−1)ɛ=(k−1)(Σλi)2(k−1)Σλi2−(Σλi)2Σλi2=tr(RV)2tr(RVRV)

Thus, SPM applies the Satterthwaite approximation to correct the F-statistic, implicitly using a measure of sphericity violation. Next, we will see that this approach corresponds to that employed in conventional statistical packages.

Read full chapter

URL: https://www.sciencedirect.com/science/article/pii/B9780123725608500103

What statistical test do you use for repeated measures?

Repeated measures ANOVA is used when you have the same measure that participants were rated on at more than two time points. With only two time points a paired t-test will be sufficient, but for more times a repeated measures ANOVA is required.

What is repeated measures design in statistics?

Repeated measures design is a research design that involves multiple measures of the same variable taken on the same or matched subjects either under different conditions or over two or more time periods. For instance, repeated measurements are collected in a longitudinal study in which change over time is assessed.

Which test is used in a repeated measures or within

A repeated measures ANOVA is also referred to as a within-subjects ANOVA or ANOVA for correlated samples. All these names imply the nature of the repeated measures ANOVA, that of a test to detect any overall differences between related means.

What does repeated measures ANOVA tell you?

A repeated measures ANOVA is used to determine whether or not there is a statistically significant difference between the means of three or more groups in which the same subjects show up in each group.