Chapter 8 Random effects

8.1 Fixed vs. random effects: the big picture

In statistics, we regard each data point as a single observation from an underlying distribution, or population, of possible values. A statistical model characterizes the structure of the underlying distributions from which the observed data are drawn. Loosely, we think of a statistical model as combining two distinct components. One component of the statistical model describes the mean²⁶ of the underlying distributions, and how the mean depends on (for example) predictors in a regression model, or on the particular levels of an experimental factor in an ANOVA. The other component of the model describes how the underlying distributions vary around their respective mean, both with respect to the underlying distribution for a single data point, and the mutual relationship among the underlying distributions for several data points (i.e., their correlations). In colloquial terms, this distinction is often referred to as the “signal” and the “noise” in a statistical model. To this point, we have spent nearly all of our effort thinking about how the mean, or “signal”, in a statistical model depends on predictors, treatment groups, etc. We have not yet given much thought to the noise component of the model, beyond asserting that the residual errors were iid draws from a normal distribution with a constant variance. The concept of “random effects” will give us our first tool for building richer models for the statistical noise.

So far, although we have not used the term, all of the treatment effects in the ANOVA models that we have considered have been fixed effects. We use fixed effects when we are interested in specific levels of the corresponding treatment factor. For example, in the hot dog data, were are interested in the three different types of hot dogs included in the experiment. As a second example, in the reading data, we were interested in drawing inferences about the three different teaching methods.

In contrast to fixed effects, we use random effects when the levels of a factor included in an experiment can be regarded as a representative (random) sample from a population of levels, and we are interested in drawing inferences about that population, not just the specific levels included in the experiment. The residual errors that we have included in all of our models so far is an example of a random effect. For example, in the reading data, we are not interested in drawing inferences about the particular 66 students included in the study, but instead we regard these students as a random sample from a larger population, and we want to learn about that population. Similarly, in the pine-resin data, we are not interested in limiting our inferences to the particular 24 trees included in the experiment, but we regard those particular trees as a representative sample from a population of trees to which we wish to extend our inferences.

One (imperfect) way to determine whether a factor should be treated with fixed or random effects is to ask: if the experiment were repeated, would the same levels of the factor be included in the experiment, or would the experiment include a potentially different set of levels? If the experiment would include the same levels, then this suggests that the experimenter is interested in drawing inferences about these particular levels, and therefore fixed effects should be used. On the other hand, if the experiment could potentially include different levels, then this suggests that the levels have been drawn from some larger population of levels, and thus random effects should be used.

Whether we treat an effect as a fixed effect or a random effect determines the parameters that we estimate to characterize that effect. For fixed effects, we estimate parameters that quantify the differences among the specific levels included in the experiment. For example, with the reading data, we use parameters that allow us to draw inferences about the differences among the three reading methods in the experiment. For random effects, we estimate a variance parameter that quantifies the spread in the distribution of the population from which the levels included in the experiment were drawn. Thus, for the reading data, we also estimated an error variance that quantifies the variation in responses among students who were taught with the same method.

As a bit of terminology, statistical models can be classified by whether they contain just fixed effects, just random effects, or both fixed and random effects (not counting either the intercept or the residual error term, which all models have). A model with only fixed effects is called (sensibly enough) a fixed effects model, and a model with only random effects is called a random effects model. Models with both fixed and random effects are called mixed-effects models, or sometimes just ``mixed models’’ for short.

To sum up, we draw inferences about the levels of the fixed effects included in the experiment, and the scope of those inferences extends to the population(s) from which the random effects were selected.

8.2 Random-effects models

Models with no fixed effects (except for the intercept) are called random-effects models. Pure random-effects models are not frequently encountered in the life sciences, but they are occasionally useful for characterizing hierarchical variability.

This example is taken from the on-line SAS documentation. The description given there reads

In the following example from Snedecor and Cochran (1967), an experiment is conducted to study the variability of calcium concentration in turnip greens. Four plants are selected at random; then three leaves are randomly selected from each plant. Two 100-mg samples are taken from each leaf. The amount of calcium is determined by microchemical methods.

A partial data listing is shown here:

plant   leaf    y
p1    l1      3.28
p1    l1      3.09
p1    l2      3.52
p1    l2      3.48
...
p2    l1      2.46
p2    l1      2.44
...
p4    l3      3.31
p4    l3      3.31

With these data, we may want to ask: how does sample-to-sample variation within a leaf compare to lea\(F\)-to-leaf variation within a plant, and how do each of these sources of variation compare to plant-to-plant variation?

We can answer these questions with a random-effects model. Our goal here is to quantify the sources of variation that contribute to variation among the samples. We are not interested in quantifying the differences among these particular four plants, or among these particular leaves on these particular plants, or among these particular subsamples on these particular leaves. Instead, each factor in our experiment — plant, leaf and sample — contains levels that are regarded as a random sample from a population of levels. If we were to repeat the experiment, we might choose different plants, or different leaves, or different samples.

Let \(y_{ijk}\) be the response for sample \(k\) from leaf \(j\) of pant \(i\). Let’s try the model: \[ y_{ijk} = \mu + P_i + L_{ij} + \varepsilon_{ijk} \] where \(\mu\) is the reference level, \(P_i\) is a random effect associated with plant \(i\), \(L_{ij}\) is a random effect associated with leaf \(j\) of plant \(i\), and \(\varepsilon_{ijk}\) is the residual error . Because the plants and leaves have been randomly selected from populations of plants and leaves, we assume \[\begin{eqnarray*} P_i & \sim & \mathcal{N}\left(0, \sigma^2_P \right) \\ L_{ij} & \sim & \mathcal{N}\left(0, \sigma^2_L \right) \\ \varepsilon_{ijk} & \sim & \mathcal{N}\left(0, \sigma^2_{\varepsilon} \right) \end{eqnarray*}\]

In words, each of the random effects are (independently) drawn from normal distributions with distinct variances. Note that we can think of the residual \(\varepsilon_{ijk}\) as the random effect associated with sample \(k\) from leaf \(j\) of plant \(i\). There’s something else going on with this experiment. Namely, the “leaf” factor and the “plant” factor are not crossed. That is, each leaf is associated with one and only one plant. Thus, the “leaf” factor is nested within the “plant” factor. Similarly, each subsample is associated with one and only one leaf, so the subsample is nested within the leaf.²⁷

In a random effects model, our goals are not to estimate the individual \(P_i\)’s or \(L_{ij}\)’s themselves. Instead, our goal is to estimate the variance parameters (or variance components) \(\sigma^2_P\), \(\sigma^2_L\) and \(\sigma^2_{\varepsilon}\) . These parameters quantify the plant-to-plant variability, the lea\(F\)-to-leaf variability within each plant, and the sample-to-sample variability within each leaf, respectively. Here is an analysis within PROC MIXED:

proc mixed; 
  class Plant Leaf; 
  model Calcium = ;
  random Plant Leaf(Plant); 
run;

Covariance Parameter
Estimates

Cov Parm        Estimate
Plant             0.3652
Leaf(Plant)       0.1611
Residual        0.006654

In PROC MIXED, we only list fixed effects on the right-hand side of the equals sign in the MODEL statement. In this model, there are no fixed effects, so nothing is placed on the right-hand side of the equals sign. (The reference level is a fixed effect, but it is included in every model, so it is never specified.) In the RANDOM statement, we list the random effects. Here, we include random effects for the individual plants and for the leaves nested within plants. Note that SAS uses the parenthetical notation to denote nesting as well, so that we write LEAF(PLANT) to tell SAS that the leaf factor is nested within the plant factor. We do not have to specify a random effect for the sample error, because the sample error is equivalent to the residual error, and this is always included in the model as well.

Our parameter estimates are \(\hat{\sigma}^2_P = 0.3652\), \(\hat{\sigma}^2_L = 0.1611\), and \(\hat{\sigma}^2_{\varepsilon} = 0.0067\). Thus, we conclude that plant-to-plant variation is roughly twice as large as lea\(F\)-to-leaf variation within plants, and that both of these are much larger than sample-to-sample variation within leaves.

It is possible to generate an ANOVA table for the random-effects model. We could define sums-of-squares for each of the model terms in the following way: \[\begin{eqnarray*} SS(Plant) & = & \sum_{i=1}^4 \sum_{j=1}^3 \sum_{k=1}^2 \left( \bar{y}_{i++} - \bar{y}_{+++} \right)^2 \\ SS(Leaf(Plant)) & = & \sum_{i=1}^4 \sum_{j=1}^3 \sum_{k=1}^2 \left( \bar{y}_{ij+} - \bar{y}_{i++} \right)^2 \\ SS(Error) & = & \sum_{i=1}^4 \sum_{j=1}^3 \sum_{k=1}^2 \left( \bar{y}_{ijk} - \bar{y}_{ij+} \right)^2 \\ \end{eqnarray*}\] Each sum-of-squares is associated with a certain number of df:

source	df
Plant	3 (= 4 - 1)
Leaf(Plant)	8 (= 12 - 1- 3)
Error	12 (= 24 - 1- 3 - 8)
Total	23

Something different has happened here. Because leaf is nested within plant, we have to deduct the df for the SS(Plant)from the df for SS(Leaf(Plant)). This is the general rule for nested factors. If factor ‘B’ is nested in factor ‘A’, then the df for SS(A) are subtracted from the df for SS(B(A)). (In fact, this explains why we’ve been deducting df for factorial effects from the df for error all along. The replicates are always nested within the factorial treatments, even though we haven’t thought of it in this way until now.)

In models in which the residual error is the only random effect, we have learned that the estimate of that error, \(\hat{\sigma}^2_{\varepsilon}\), is just equal to the MSE. Does this mean that we can equate MS(Plant) and MS(Leaf(Plant)) with \(\hat{\sigma}^2_P\) and \(\hat{\sigma}^2_L\), respectively? Unfortunately, in this case life isn’t that simple. As it turns out, there is a relationship between each of the mean squares and each of the variance components, but it’s complicated. One way to generate an ANOVA and learn about this relationship in SAS is to use the METHOD = TYPE3 option in PROC MIXED:

proc mixed data = turnip method = type3;
  class Plant Leaf;
  model y = ;
  random Plant Leaf(Plant);
run;

Type 3 Analysis of Variance

                           Sum of
Source           DF       Squares   Mean Square  Expected Mean Square

Plant             3      7.560346      2.520115  Var(Residual) + 2 Var(Leaf(Plant))
                                                 + 6 Var(Plant)
Leaf(Plant)       8      2.630200      0.328775  Var(Residual) + 2 Var(Leaf(Plant))
Residual         12      0.079850      0.006654  Var(Residual)

Covariance Parameter
Estimates

Cov Parm        Estimate
Plant             0.3652
Leaf(Plant)       0.1611
Residual        0.006654

Here, we see the ANOVA table with SS, MS and df for each variance component. We also see a column labeled “Expected Mean Square”. Briefly, the expected mean square is a formula that equates the “expectation” of the mean square for each variance component to each of the model parameters. (Remember that an “expectation” is the average value that you would observe over many runs of the same experiment.) The derivation of this formula is beyond the scope of ST512. With the formula in hand, though, it suggests one method of estimating the variance components. We can estimate the variance components by equating each MS with its expected mean square, and solving for the variance-component parameters.

Sometimes, this “method of moments” approach to parameter estimation works well. Other times, however, it can generate negative estimates of variance parameters. (To see this, suppose that in the example above we had a data set where MS(Leaf) < MS(Residual). Then we would have to have \(\hat{\sigma}^2_L < 0\), which makes no sense.) Consequently, several other approaches have been developed to estimate variance-component parameters. The default in PROC MIXED is to use REML, which is an acronym for REstricted Maximum Likelihood. Although we will not discuss REML, do be aware that REML is a iterative numerical method. In some cases, the method can fail. The output to PROC MIXED presents an ‘Iteration History’ that gives you some information about the success or failure of REML. It is good practice to review the Iteration History and the Log file to make sure the procedure converged successfully. (We also saw a numerical method for estimating parameters when we considered non-linear regression.)

For some nice data sets (such as the one above), the method-of-moments estimates of the variance components and the REML estimates are the same. In other cases, they may differ. General practice is to use the REML estimates unless a good reason exists to use another method of estimation.

8.3 Subsampling

A second technique for reducing experimental error is subsampling. Subsampling refers to measuring the same experimental unit multiple times. This is the first occasion we have encountered in ST512 where the experimental unit (EU) and the measurement unit (MU) differ. Remember that the EU is the unit to which the experimental factor is applied, and the MU is the unit that is measured.

Example: A former ST512 student studied the effect of 6 different vase solutions on the shelf lives of roses. She had 30 different vases at her disposal. The 6 different solutions were randomly assigned to the 30 vases in a balanced CRD with a one-way treatment structure. Three roses (or stems) were placed in each vase, and the shelf life of each rose was recorded. Here are some of the data for the cultivar “Freedom”:

cultivar trt     vase    stem    lifetime
Freedom  1       1       1       13     
Freedom  1       1       2       13     
Freedom  1       1       3       11     
Freedom  1       2       1       14     
Freedom  1       2       2       11     
Freedom  1       2       3       12     
Freedom  1       3       1       14     
...
Freedom  1       5       3       13     
Freedom  2       1       1       16     
Freedom  2       1       2       16     
Freedom  2       1       3       12     
Freedom  2       2       1       13    
...

Here, the EU is the vase, and the MU is the stem. The stem is not the EU! The different stems are subsamples, in the sense that they are repeated samples of the same EU. Subsampling does not increase the number of EUs (and hence does not increase the number of df available to estimate the experimental error), but it does reduce the experimental error.

8.3.1 Equal subsamples per EU

If the number of subsamples (stems) is the same for each EU (vase), we can average the subsample data (stem lifetimes) in each EU to generate 1 data point per EU, and then analyze the EU-averages with the usual one-way ANOVA.

To calculate an average response for each EU, we can use whatever software package is most convenient (e.g., Excel). To calculate EU-level averages in SAS, we can use the following procedure. Suppose we had loaded the data from a file with the following DATA step:

data freedom;
   infile ".../rose.txt" firstobs=2;
   input cultivar$ trt vase stem lifetime;
run;

We can then calculate average lifetimes per vase with PROC MEANS:

/* This procedure averages the stems in each vase 
   and produced a new data set named 'rosemeans' 
   that contains the newly calculated average lifetimes */

proc means data=freedom noprint;
  by cultivar trt vase;
  var lifetime;
  output out=rosemeans mean=avglife;
run;

We can use PROC PRINT to examine the newly created data set:

proc print data=rosemeans;
run;

Obs  cultivar    trt    vase    _TYPE_    _FREQ_    avglife
1    Freedom      1       1        0         3      12.3333
2    Freedom      1       2        0         3      12.3333
3    Freedom      1       3        0         3      13.3333
4    Freedom      1       4        0         3      12.6667
5    Freedom      1       5        0         3      12.6667
6    Freedom      2       1        0         3      14.6667
7    Freedom      2       2        0         3      13.6667
...
30   Freedom      6       5        0         3      12.0000

Note that there is now one observation per vase. Now we analyze the per-vase averages using a standard one-way ANOVA analysis in PROC GLM:

proc glm data = rosemeans;
  class trt;
  model avglife = trt;
  means trt / tukey;
run;

Dependent Variable: avglife

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        5     122.9185185      24.5837037       9.88    <.0001
Error                       24      59.6888889       2.4870370
Corrected Total             29     182.6074074

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
trt                          5     122.9185185      24.5837037       9.88    <.0001

Tukey's Studentized Range (HSD) Test for avglife
Means with the same letter are not significantly different.

Tukey 
Grouping   Mean      N    trt
A       13.8667      5    2
A       13.6000      5    3
A       13.0667      5    6
A       12.6667      5    1
A       11.8667      5    5
B        7.8667      5    4

For comparison, suppose that instead only one rose stem had been included in each vase. The data set ‘rose1’ was formed by extracting the first rose stem listed for each vase from the original data set:

cultivar trt     vase    stem    lifetime
Freedom  1       1       1       13     
Freedom  1       2       1       14     
Freedom  1       3       1       14     
...
Freedom  2       1       1       16     
Freedom  2       2       1       13    
...

Here is the same one-factor ANOVA analysis for this smaller data set:

proc glm data = freedom1;
  class trt;
  model lifetime = trt;
run;

                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        5      84.5666667      16.9133333       2.30    0.0765
Error                       24     176.4000000       7.3500000
Corrected Total             29     260.9666667

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
trt                          5     84.56666667     16.91333333       2.30    0.0765

Now the differences among the vase treatments are no longer significant! Note that the df’s have not changed: this is because both the full and reduced data sets contain the same number of EUs. What has changed is that the MSE for the reduced data set is much larger than the MSE for the full data set. This is because the `unexplained variability’ in the reduced data set includes both variability among vases assigned to the same treatment, and variability among individual rose stems. In the full data set, however, the unexplained variability in each data point consists of variability among vases assigned to the same treatment, and variability in the average lifetime of three individual rose stems. Because averaging reduces variability, the unexplained variability in the full data set is smaller than the unexplained variability in the reduced data set.

Thus, subsampling reduces experimental error by averaging out the variability among individual measurements from the same EU.

The wrong way to analyze these data is to treat the stem as the EU:

/* This is the WRONG analysis */

proc glm data = freedom;
  class trt;
  model lifetime = trt;
run;
                                        Sum of
Source                      DF         Squares     Mean Square    F Value    Pr > F
Model                        5     368.7555556      73.7511111      10.38    <.0001
Error                       84     597.0666667       7.1079365
Corrected Total             89     965.8222222

Source                      DF     Type III SS     Mean Square    F Value    Pr > F
trt                          5     368.7555556      73.7511111      10.38    <.0001

This analysis is wrong because it assumes that vase solutions were randomized to stems and not vases. This is pseudoreplication, because the analysis incorrectly inflates the amount of replication in the experiment. Pseudoreplication is a common and insidious mistake.

8.3.1.1 A second look: Analysis with a mixed effects model

The method of averaging subsamples works well when each EU has the same number of subsamples. However, averaging subsamples does not work well when the number of subsamples per EU differs. This is because EUs with more subsamples will have less variability than EUs with fewer subsamples, and thus the equal variance assumption of ANOVA will be violated with averaged data. (As always, the seriousness of this violation is a matter of degree. If, for example, 29 vases had 3 stems and one vase had 2 stems, the unequal variance caused by averaging might be sufficiently mild that one could still justify averaging.) The best way to analyze data with unequal numbers of subsamples is to use a model that explicitly accounts for both the variation among EUs and the variation among MUs.

To keep things from getting too complicated, we won’t look here at an example with an unequal number of subsamples per EU, but will instead consider an alternative analysis of the rose data above. This analysis can be easily modified to accommodate unequal numbers of subsamples per EU.

Here is a model that we can use that models the data for the individual stems explicitly. This model will work regardless of whether we have the same number of stems in each vase. Let be the lifetime of the kth stem from the jth vase assigned to the ith treatment. Consider the model \[ y_{ijk} = \mu_i + V_{ij} + \varepsilon_{ijk} \] where

\(i = 1, 2, \ldots, 6\) is an index for the treatment
\(j = 1, 2, \ldots, 5\) is an index for the vase in each treatment
\(k = 1, 2, 3\) is an index for the stem in each vase
\(\mu_i\) is the average response for treatment \(i\)
\(V_{ij}\) is a random effect for vase \(j\) from treatment \(i\), \(V_{ij} \sim \mathcal{N}\left(0, \sigma^2_V \right)\)
\(\varepsilon_{ijk}\) is subsample (residual) error for stem \(k\) from vase \(j\) of treatment \(i\), \(\varepsilon_{ijk} \sim \mathcal{N}\left(0, \sigma^2_{\varepsilon} \right)\)

The key feature of the model above is that it includes two random effects: one for the vase-to-vase error within treatments, and a second for the stem-to-stem error within vases. We assume that the random effects for the vases are drawn from a normal distribution with mean 0 and variance \(\sigma^2_V\), and that the random effects for the stems are drawn from a normal distribution with mean 0 and variance \(\sigma^2_{\varepsilon}\).

We can still think about building an ANOVA table for this analysis through a SS decomposition. The df for each SS will be:

source	df
Treatment	5
Vase (EU error)	24 (= 30 - 1- 5)
Stems (subsample error)	60 (= 90 - 1 - 5 - 24)
Total	89

The key to this analysis is to realize that the treatment effects should be tested using the mean-squared error for the vases, not the stems. This is because the vases are the EUs (treatments were randomized to vases). Thus, any inferences that we draw about the treatments (i.e., \(F\)-tests, linear contrasts, or multiple comparisons) should be based on the MSE for the vases, not the MSE for the stems. In particular, we anticipate that the \(F\)-test for differences among the treatments will be based on an \(F\)-distribution with 5 ndf and 24 ddf.

Here is PROC MIXED code and output for the Freedom rose data, using the default REML estimation:

proc mixed data = freedom;
  class trt vase;
  model lifetime = trt;
  random vase(trt);
run;

Covariance Parameter
Estimates

Cov Parm      Estimate
vase(trt)       0.1648
Residual        6.9667

Type 3 Tests of Fixed Effects

              Num     Den
Effect         DF      DF    F Value    Pr > F
trt             5      24       9.88    <.0001

Remarks:

The MODEL statement in PROC MIXED only contains the fixed effects. In this model, the only fixed effects are the different treatments (vase solutions).
The RANDOM statement specifies the random effects to be included in the model. The residual (= the error term for the subsample) is always included by default, so a separate term for “stem” did not need to be specified.
We use the parenthetical syntax VASE(TRT) to tell SAS that VASE is nested within TRT.
The portion of the output labeled “Covariance Parameter Estimates” provides the estimates for \(\sigma^2_V\) and \(\sigma^2_{\varepsilon}\). In this case, they are \(\sigma^2_V = 0.165\) and \(\sigma^2_{\varepsilon} = 6.97\). Although these estimates are not the primary focus of our analysis, they are still informative. In this case, they tell us that the stem-to-stem variability among stems in the same vase is much greater than the vase-to-vase variability among vases receiving the same treatment.
The portion of the output labeled “Type 3 Tests of Fixed Effects” provides \(F\)-tests of the fixed effects. With REML estimation, PROC MIXED does not present the full ANOVA table. Note that the \(F\)-test for the treatment effect is identical to the \(F\)-test that we obtained by averaging subsamples within EUs. These two tests are identical because in this case we have the same number of stems per vase.
Note that the \(F\)-test for the treatment effects has the correct ndf and ddf. Unless we specify otherwise, PROC MIXED uses the “containment” method for determining the appropriate error terms. In this case, the fixed-effects for TRT are contained within the random effect for VASE(TRT) (this is another way of saying that the vases are nested within the treatement), and so PROC MIXED uses the MSE for VASE(TRT) to test for the TRT fixed effects. The df for the \(F\)-tests can be used as a check to make sure that you’ve coded the model correctly.

Here is an example of using linear contrasts and multiple comparisons to explore differences among the treatments. Note that throughout the df used for inference are the df that correspond to the vase error, not to the stem error.

Linear contrasts:

proc mixed data = freedom;
  class trt vase;
  model lifetime = trt;
  random vase(trt);
  estimate 'Difference b/w solutions 1 and 2' trt 1 -1 0 0 0 0;
run;
                                                Standard
Label                               Estimate       Error      DF    t Value    Pr > |t|
Difference b/w solutions 1 and 2     -1.2000      0.9974      24      -1.20      0.2407

Multiple comparisons (there is no MEANS statement available in PROC MIXED, so we use LSMEANS instead):

proc mixed data = freedom;
  class trt vase;
  model lifetime = trt;
  random vase(trt);
  lsmeans trt / pdiff adj=tukey;
run;

Least Squares Means

                             Standard
Effect    trt    Estimate       Error      DF    t Value    Pr > |t|
trt       1       12.6667      0.7053      24      17.96      <.0001
trt       2       13.8667      0.7053      24      19.66      <.0001
trt       3       13.6000      0.7053      24      19.28      <.0001
trt       4        7.8667      0.7053      24      11.15      <.0001
trt       5       11.8667      0.7053      24      16.83      <.0001
trt       6       13.0667      0.7053      24      18.53      <.0001

Differences of Least Squares Means

                                 Standard
Effect   trt   _trt   Estimate      Error     DF   t Value   Pr > |t|   Adjustment      Adj P

trt      1     2       -1.2000     0.9974     24     -1.20     0.2407   Tukey          0.8310
trt      1     3       -0.9333     0.9974     24     -0.94     0.3587   Tukey          0.9331
trt      1     4        4.8000     0.9974     24      4.81     <.0001   Tukey          0.0008
trt      1     5        0.8000     0.9974     24      0.80     0.4304   Tukey          0.9644
trt      1     6       -0.4000     0.9974     24     -0.40     0.6919   Tukey          0.9985
trt      2     3        0.2667     0.9974     24      0.27     0.7915   Tukey          0.9998
trt      2     4        6.0000     0.9974     24      6.02     <.0001   Tukey          <.0001
trt      2     5        2.0000     0.9974     24      2.01     0.0563   Tukey          0.3685
trt      2     6        0.8000     0.9974     24      0.80     0.4304   Tukey          0.9644
trt      3     4        5.7333     0.9974     24      5.75     <.0001   Tukey          <.0001
trt      3     5        1.7333     0.9974     24      1.74     0.0951   Tukey          0.5217
trt      3     6        0.5333     0.9974     24      0.53     0.5978   Tukey          0.9941
trt      4     5       -4.0000     0.9974     24     -4.01     0.0005   Tukey          0.0060
trt      4     6       -5.2000     0.9974     24     -5.21     <.0001   Tukey          0.0003
trt      5     6       -1.2000     0.9974     24     -1.20     0.2407   Tukey          0.8310

If we want to generate an ANOVA table, we can use the METHOD=TYPE3 option for method-of-moments estimation:

proc mixed data = freedom method = type3;
  class trt vase;
  model lifetime = trt;
  random vase(trt);
run;

Type 3 Analysis of Variance

                         Sum of
Source         DF       Squares   Mean Square  Expected Mean Square                 Error Term

trt             5    368.755556     73.751111  Var(Residual) + 3 Var(vase(trt))     MS(vase(trt))
                                               + Q(trt)
vase(trt)      24    179.066667      7.461111  Var(Residual) + 3 Var(vase(trt))     MS(Residual)
Residual       60    418.000000      6.966667  Var(Residual)                        .


Type 3 Analysis of Variance

            Error
Source         DF    F Value    Pr > F
trt            24       9.88    <.0001
vase(trt)      60       1.07    0.4015

Covariance Parameter
Estimates

Cov Parm      Estimate
vase(trt)       0.1648
Residual        6.9667

Again, things work out nicely here because the data are balanced, but it is not guaranteed that the method-of-moments estimation and REML estimation will match up exactly.

A final comment on subsampling: Often it is more expensive to have more EUs and is easier to have more subsamples. The extent to which increasing the number of subsamples will increase statistical power depends on the relative magnitudes of the variability among EUs and the variability among subsamples. If the variability among EUs is large relative to the variability among subsamples, then adding subsamples does little to increase statistical precision. But if the variability among subsamples is of the same or greater magnitude than the variability among EUs (as in the rose example), then adding subsamples can increase precision and statistical power.

8.4 \(^\star\)Mathematical foundations

In this section, we introduce formal mathematical arguments for how random effects induce correlations among grouped responses. This section makes heavy uses of the mathematics of probability. Before proceeding, we will need to review several key probability concepts

8.4.1 Probability refresher

Let \(X\) denote a random variable. Recall that the expectation, or expected value, of \(X\) is one measure of the center of the probability distribution of \(X\). The expectation can also be thought of as the mean or average of \(X\). We denote the expectation of \(X\) as \(\mathrm{E}\left[X\right]\), or sometimes \(\mu_X\). (The subscript on \(\mu\) is only used when we need it to clarify to which random variable the expectation refers.)

The variance of \(X\), often denoted \(\mbox{Var}\left(X\right)\) or \(\sigma^2_X\), is defined as the expectation of \((X - \mu_X)^2\). In other words, \[ \mbox{Var}\left(X\right) = \mathrm{E}\left[(X - \mu_X)^2\right]. \] The variance of a random variable is a measure of its dispersion, or spread.

Consider two random variables, \(X\) and \(Y\). The covariance of \(X\) and \(Y\), written as \(\mbox{Cov}\left(X, Y\right)\), is defined as the expectation of the product \((X - \mu_X) (Y - \mu_Y)\). In other words, \[ \mbox{Cov}\left(X, Y\right) = \mathrm{E}\left[(X - \mu_X)(Y - \mu_Y)\right]. \] Note that the covariance of a random variable with itself is just its variance, that is, \(\mbox{Cov}\left(X, X\right) = \mbox{Var}\left(X\right)\). The covariance of two random variables is a measure of their association. If \(X\) and \(Y\) are positively associated, then values of \(X\) larger (smaller) than \(\mu_X\) will tend to co-occur with values of \(Y\) larger (smaller) than \(\mu_Y\), thus the product \((X - \mu_X) (Y - \mu_Y)\) will usually be positive, and so the expectation of that product — the covariance — will be positive as well. On the other hand, if the reverse is true, and values of \(X\) larger (smaller) than \(\mu_X\) tend to co-occur with values of \(Y\) smaller (larger) than \(\mu_Y\), then the product \((X - \mu_X) (Y - \mu_Y)\) will usually be negative, and so the covariance will be negative.

When \(X\) and \(Y\) are independent, then their covariance is zero, i.e., \(\mbox{Cov}\left(X, Y\right) = 0\). In general, the converse is not true: just because two random variables have a covariance of 0 does not necessarily imply that they are independent. However, if \(X\) and \(Y\) are two normally distributed random variables, then \(\mbox{Cov}\left(X, Y\right) = 0\) does imply that \(X\) and \(Y\) are independent.

Covariance is not just a measure of the association between two random variables. A covariance is also a measure of the dispersion or spread of those variables. Sometimes, we wish to factor out the dispersion component of covariance, and thus isolate the portion of covariance that measures the strength of association. We can do so by considering the correlation between \(X\) and \(Y\), written \(\mbox{Cor}\left(X, Y\right)\) or \(\rho_{X,Y}\), which scales the covariance by the dispersion of both \(X\) and \(Y\): \[ \mbox{Cor}\left(X, Y\right) = \dfrac{\mbox{Cov}\left(X, Y\right)}{\sqrt{\mbox{Var}\left(X\right)\mbox{Var}\left(Y\right)}}. \] Or, if we recognize that \(\sqrt{\mbox{Var}\left(X\right)} = \mbox{SD}\left(X\right)\) (the standard deviation of \(X\)), then we can write \[ \mbox{Cor}\left(X, Y\right) = \dfrac{\mbox{Cov}\left(X, Y\right)}{\mbox{SD}\left(X\right)\mbox{SD}\left(Y\right)}. \]

Now here are some useful rules about the expectation and variance of a sum of random variables. First, the expectation of \(X+Y\) is simply \[ \mathrm{E}\left[X+Y\right] = \mathrm{E}\left[X\right] + \mathrm{E}\left[Y\right]. \] The variance of \(X+Y\) is more complicated: \[ \mbox{Var}\left(X+Y\right) = \mbox{Var}\left(X\right) + \mbox{Var}\left(Y\right) + 2 \mbox{Cov}\left(X, Y\right). \] Thus, if \(X\) and \(Y\) are independent, then \(\mbox{Var}\left(X+Y\right) = \mbox{Var}\left(X\right) + \mbox{Var}\left(Y\right)\). But, if \(X\) and \(Y\) are not independent, then we have to factor in the covariance.

8.4.2 Application to models with random effects

Here is how this applies to models with random effects. Consider our model for the turnip-green data, but consider a simplified data set where there is only one sample from each leaf. Let’s write the model for these data as \[ y_{ij} = \mu + P_i + \varepsilon_{ij} \] where \(P_i \sim \mathcal{N}\left(0, \sigma^2_P \right)\) represents the plant random effects, and \(\varepsilon_{ijk} \sim \mathcal{N}\left(0, \sigma^2_{\varepsilon} \right)\) represents the leaf random effects (the residual error in this case). Let’s first find an expression for the variance of \(y_{ij}\): \[\begin{eqnarray*} \mbox{Var}\left(y_{ij}\right) & = & \mbox{Var}\left(\mu + P_i + \varepsilon_{ij}\right) \\ & = & \mbox{Var}\left(P_i\right) + \mbox{Var}\left(\varepsilon_{ij}\right) + 2 \mbox{Cov}\left(P_i, \varepsilon_{ij}\right)\\ & = & \sigma^2_P + \sigma^2_{\varepsilon}. \end{eqnarray*}\] Note that \(\mu\) is a constant, and not a random variable, so its variance is 0. Note also that because the plant random effects and the leaf random effects are independent, then \(\mbox{Cov}\left(P_i, \varepsilon_{ij}\right) = 0\).

Now, let’s try to find an expression for the correlation between two data points. First, we’ll find the covariance between two data points. To do so, we’ll need to use a rule for the covariance between two sums of random variables. That rule is \[ \mbox{Cov}\left(X+Y, W+Z\right) = \mbox{Cov}\left(X, W\right) + \mbox{Cov}\left(X, Z\right) + \mbox{Cov}\left(Y, W\right) + \mbox{Cov}\left(Y, Z\right). \] (Note that we can apply this rule to obtain \(\mbox{Var}\left(X+Y\right) = \mbox{Cov}\left(X+Y, X+Y\right) = \mbox{Var}\left(X\right) + \mbox{Var}\left(Y\right) + 2 \mbox{Cov}\left(X, Y\right)\).)

Now, write two different data points as \(y_{ij}\) and \(y_{kl}\). Then, \[\begin{eqnarray*} \mbox{Cov}\left(y_{ij}, y_{kl}\right) & = & \mbox{Cov}\left(\mu + P_i + \varepsilon_{ij}, \mu + P_k + \varepsilon_{kl}\right) \\ & = & \mbox{Cov}\left(P_i, P_k\right) + \mbox{Cov}\left(P_i, \varepsilon_{kl}\right) + \mbox{Cov}\left(\varepsilon_{ij}, P_k\right) + \mbox{Cov}\left(\varepsilon_{ij}, \varepsilon_{kl}\right). \end{eqnarray*}\] Again, the \(\mu\)’s drop out because they are constants and have no variance. Next, because the plant random effects are independent of the leaf random effects, \(\mbox{Cov}\left(P_i, \varepsilon_{kl}\right) = \mbox{Cov}\left(\varepsilon_{ij}, P_k\right) = 0\). Next, assuming that are two data points are from different leaves, then \(\mbox{Cov}\left(\varepsilon_{ij}, \varepsilon_{kl}\right) = 0\) (because the lea\(F\)-level random effects are independent of one another).

What about \(\mbox{Cov}\left(P_i, P_k\right)\)? If the leaves come from the same plant, that is, if \(i=k\), then we have \(\mbox{Cov}\left(P_i, P_k\right) = \mbox{Cov}\left(P_i, P_i\right) = \mbox{Var}\left(P_i\right) = \sigma^2_P\). However, if the leaves come from different plants (\(i \neq k\)), then the two plant random effects are independent, and thus \(\mbox{Cov}\left(P_i, P_k\right) = 0\). Putting it all together, we have \[ \mbox{Cov}\left(y_{ij}, y_{kl}\right) = \begin{cases} \sigma^2_P & i = k \\ 0 & i \neq k \end{cases}. \]

Finally, to convert a covariance to a correlation, we just have to divide the covariance by \(\sqrt{\mbox{Var}\left(y_{ij}\right) \mbox{Var}\left(y_{kl}\right)} = \sigma^2_P + \sigma^2_{\varepsilon}\), giving \[ \mbox{Cor}\left(y_{ij}, y_{kl}\right) = \begin{cases} \dfrac{\sigma^2_P}{\sigma^2_P + \sigma^2_{\varepsilon}} & i = k \\ 0 & i \neq k \end{cases}. \]

Thus, we see that including the plant-level random effect induces a positive correlation between leaves from the same plant. (Note that correlation must be positive because \(\sigma^2_P\) and \(\sigma^2_{\varepsilon}\) must both be positive.)

Bibliography

Snedecor, G. W., and W. G. Cochran. 1967. Statistical Methods. 6th ed. Ames: Iowa State University Press.

The mean of a distribution is such an important and central concept in statistics that we have many different synonyms for it. Although the following terms are not perfect synonyms (their technical definitions are all slightly different), all of the following tend to be used as loose synonyms for a mean: average, center of mass, expected value, and expectation.↩︎
In fact, it is always true that the random effect associated within the individual data point is nested within another term in the model. Because this is always true, we usually don’t call attention to it.↩︎