Multi-Level Modeling
Overview
Most of our statistical models rely on the assumption that each observation is independent. However, individual health can be “clustered” due to the influence of shared contexts or “contagious” due to the transmission of ideas or pathogens, violating assumptions of independence. This non-independence may be of direct interest, or merely a nuisance causing our standard errors to be incorrect.
The non-independence among real people can often be ignored in sparse samples. A study of 100 randomly samples US adults is unlikely to include more than 1 person from any family, neighborhood, workplace, or clinic. This sample includes such a small fraction of the total population, that any individual’s close contacts are unlikely to be sampled. Whatever generalized linear model we apply to this sample, the residuals are likely to be approximately independent, though we may still be interested to adjust for confounders at the individual or group-level.
On the other hand, dense samples and those that incorporate groups into the recruitment strategy warrant special attention. A dense sample including a large segment of the population (e.g., a sample of 100 epidemiology students at Columbia) is much more likely to include people who share health-relevant contexts or who influence each others’ behaviors and exposures. Identifying groups within the population (e.g., cohort based on year of program entry, degree program, affiliated departmental cluster) may help to address this non-independence. A complex sample (e.g., a sample of 100 epidemiology students in each of the top 10 schools of public health) may incorporate group identification directly into the recruitment strategy. Even after the individual- and group-level predictors are entered into a model to predict individual health, the residuals may be correlated among individuals in the same group. Mixed models (aka random effects models or multilevel models) are an attractive option for working with clustered data, and should be considered alongside alternatives such as generalized estimating equations.
Description
Before we move on to consider how to implement such approaches, let’s introduce a couple of terms to help describe multilevel data structures. We traditionally label the level at which the outcome is assessed as “Level 1”. Level 1 typically corresponds to either the individual person, or to a single measurement on that person (e.g., one study visit, one tooth). These Level 1 observations are grouped into mutually exclusive and collectively exhaustive categories at “Level 2” for a 2-level model. Correct specification of Level 2 groups (e.g., neighborhoods, schools, hospitals) is critical, since we typically assume that these are stable, meaningful groups; that the groups are independent of each other (e.g. no residual correlation between adjacent groups); and that there is no unspecified covariance structure within the Level 2 groups (e.g. there are no intermediate subgroups between Level 1 and Level 2 relevant to predicting the health outcome, once you account for measured predictors). If each Level 2 group has the same number of Level 1 observations, the study sample is said to be “balanced”. The term “grand mean” is introduced to distinguish the mean for the entire sample (grand mean) from the mean within group j (group mean). The equations shown above can be combined:
Multilevel Modeling with Random Intercepts
Let’s build up to multilevel models. The simplest generalized linear model has a linear outcome and no predictors. The expected value of the outcome is simply the intercept. The observed outcomes are modeled as the intercept plus a normally distributed error term.
E(Y) = intercept
Y = intercept + IndividualError (where IndividualError has a mean of 0 and an estimated standard deviation)
Equivalently, the simplest random intercept model is an “empty” model with no predictors. Again, we’ll assume a linear outcome and normally distributed error term. We introduce the subscripts i to index Level 1 and j to index Level 2. When a subscript j (but not i) is used for a parameter, this indicates that the parameter is allowed to vary across clusters, but is constant for all observations within a cluster.
E(Yij) = interceptj
<span”> Yij = interceptj + IndividualError
interceptj = GrandMean + InterceptError
So, the random intercept model introduces the group-specific intercept, which is modeled with it’s own error term. In fact, it’s really because of the error term (labeled here as “InterceptError”) that we call this a random intercept model. The errors are (usually) assumed to have a normal distribution, and both are assigned a mean of zero, and their own estimated standard deviation.
Yij = GrandMean + InterceptError + IndividualError
This empty random intercept model is particularly useful when exploring the data since it allows us to estimate how much of the outcome variation is happening between versus within groups. If groups are very different from each other, many groups will be far from the GrandMean, making the InterceptError have a large standard deviation. If observations are very similar within groups, the observed outcomes will be close to the group mean, and the standard deviation of IndividualError will be small. A convenient metric is the intraclass correlation coefficient (ICC), calculated by from the variance (recall that variance is the square of the standard deviation) for InterceptError and IndividualError. The ICC has a theoretical range of 0 to 1, and can be interpreted as the proportion of outcome variance that can potentially be explained by measured and unmeasured characteristics of Level 2 groups.
However, the empty model is generally not our primary focus. We would like to add individual and group-level predictors, and get appropriate estimates of association. To begin, let’s add the Level 1 predictor Age.
Yij = interceptj + β×Agei + IndividualError
interceptj = ExpectedOutcomeAtAgeZero + InterceptError
As above, these can be combined to a single equation with two error terms
Yij = ExpectedOutcomeAtAgeZero + β×Agei + InterceptError + IndividualError
However, we need to be cautious about our interpretation. The intercept in a generalized linear model tells us about the expected outcome when the predictors are equal to zero. We may not emphasize the interpretation of the intercept, but in a random intercept model we are giving greater attention to the intercept. The intercept may be meaningless for a variable like age in samples that do not include neonates, since estimating the average outcome at age zero requires extrapolating beyond the age range of our sample. Consequently, it is good practice in random intercept models to grand mean center continuous variables for which zero is not a plausible value. Grand mean centering involves creating a new variable that is linearly related to the original variable, but has a mean of zero (e.g., CenteredAge = Age – MeanAge). Group mean centering is an alternative with the new variable having a mean of zero within each group (e.g., GroupCenteredAge = Age – MeanAgeinGroupj), and is particularly useful in contexts where exposure relative to the group mean (rather than on the original scale), or having exposures atypical for one’s group, are of particular interest.
One of the key motivations for working with multilevel data may be to examine one or more Level 2 characteristics as predictors. For simplicity, let us define a dichotomous level 2 variable called Exposedj that is 1 for exposed groups and 0 for unexposed groups.
Yij = interceptj + β×Agei + IndividualError
interceptj = ExpectedOutcomeAtAgeZeroForUnexposedGroup + γ×Exposedj + InterceptError
These can be combined to a single equation with two slopes and two error terms
Yij = ExpectedOutcomeAtAgeZeroForUnexposedGroup + γ×Exposedj + β×Agei+ InterceptError +IndividualError
The above notation is somewhat clunky, so in most cases we convert to using multiple β and γ symbols with subscripts to indicate first their order within the equation, then whether they are group specific (indicated by j) or for the entire sample (indicated by 0 or .).
Yij = γ00 + γ10×Exposedj + β10×Agei+ InterceptError +IndividualError
Multilevel Modeling with Random Slopes
We are only going to briefly address random slopes, but it is worth spending a moment to address what these models are particularly well-suited to address: cross-level interactions. Cross-level interactions represent modification of Level 1 associations by Level 2 clusters or characteristics.
Building on the above notation we would usually assume the slope for age to be held constant, a single parameter estimated for the entire population labeled as β10. However, the effect of a Level 1 characteristic such as age can instead be allowed to vary across groups, modeled as β1j. We can then investigate whether the association of age with our health outcome is steeper in some groups than others. For example, physical activity declines with age among adolescents, but perhaps some physical activity resources and programs at the school-level are able to attenuate the decline. We could use such school level characteristics to predict the strength of the association between age and physical activity. This would show up as yet another equation with estimated parameters and an error term, and the SlopeError would be assumed to have a mean of zero and normal distribution, as was the case for our other error terms.
Tips for Multilevel Modeling
-
Multilevel models may not be worth the extra effort if you have very few observations per cluster, or a very low ICC
-
When in doubt, grand mean center continuous predictors to make zero a meaningful value if you are working with a random intercept model.
-
In a random intercept model, the cluster-specific intercepts are modeled as having a mean and variance, which in turn can be used to generate the Best Linear Unbiased Prediction (BLUP or eBLUP) for each cluster if you are interested in the expected outcome for each group. The BLUP-based predicted outcomes are sometimes called “shrinkage” estimates because they are “shrunk” toward the mean intercept for clusters with few observations.
-
In a random intercept model, the coefficients for Level 1 predictors can be interpreted as conditional on cluster (e.g., g(y) increases by β for each 1 unit increase in X, net of cluster and any adjustments).
-
In a random intercept model, the coefficients for Level 2 predictors can be interpreted as predictors of the cluster-specific intercept (e.g., the expected outcome is γ units higher for each 1 unit increase in Xj when all Level 1 predictors are held at 0).
-
A random slope model usually should also include a random intercept.
-
A random slope model can be used to test whether the coefficient for a Level 1 predictor varies across groups, and whether Level 2 measures predict the strength of that slope.
Readings
Textbooks & Chapters
Gelman A, and Hill J. Multilevel structures. In: Data Analysis Using Regression and Multilevel/Hierarchical Models. New York City: Cambridge University Press, 2007:chapter 11.
Raudenbush, S. W. and Bryk, A.S. (2001). Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park: SAGE Publications. See also Bryk, A. S. and Raudenbush, S. W. (1992). Hierarchical Linear Models. Newbury Park: Sage Publications.
Diggle, P.J., Heagerty, P., Liang K-Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data (second edition). Oxford: Oxford University Press
(Diggle also has geostatistical and longitudinal lectures and materials athttp://www.lancs.ac.uk/~diggle/ )
Singer JD, Willet JB. Applied Longitudinal Data Analysis. Oxford U Press. 2003.
Free online:
http://www.soziologie.uni-halle.de/langer/multilevel/books/goldstein.pdf
http://joophox.net/publist/amaboek.pdf (authors note material is dated)
Methodological Articles
A brief conceptual tutorial on multilevel analysis in social epidemiology
Author(s): J Merlo, B Chaix, M Yang, J Lynch and L Rastam
Journal: Journal of Epidemiology and Community Health
Year published: 2005
Multilevel analysis in public health research
Author(s): AV Diez-Roux
Journal: Annual Review of Public Health
Year published: 2000
When can group level clustering be ignored? Multilevel models versus single-level models
Author(s): P Clarke
Journal: Journal of Epidemiology and Community Health
Year published: 2008
The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology
Author(s): JM Oakes
Journal: Social Science and Medicine
Year published: 2004
Author(s): MS Ridout, CG Demetrio, D Firth
Journal: Biometrics
Year published: 1995
Author(s): AV Diez-Roux
Journal: Journal of Epidemiology and Community Health
Year published: 2002
Comparing GEE and Robust Standard Errors for Conditionally Dependent Data
Author(s): C Zorn
Journal: Political Research Quarterly
Year published: 2006
Comparing GEE and Robust Standard Errors for Conditionally Dependent Data
Author(s): C Zorn
Journal: Political Research Quarterly
Year published: 2006
To GEE or not to GEE: comparing population average and mixed models
Author(s): AE Hubbard, J Ahern, NL Fleischer, et al.
Journal: Epidemiology
Year published: 2010
Modeling neighborhood effects: the futility of comparing mixed and marginal approaches
Author(s): S Subramanian, AJ O’Malley
Journal: Epidemiology
Year published: 2011
Software/Programming Articles
Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models
Author(s): JD Singer
Journal: Journal of Educational and Behavioral Statistics
Year published: 1998
Some applications of generalized linear latent and mixed models in epidemiology
Author(s): A Skrondal, S Rabe-Hesketh
Journal: Norwegian Journal of Epidemiology
Year published: 2003
Growth Modeling Using Random Coefficient Models: Model Building, Testing, and Illustrations
Author(s): PD Bliese, RE Ployhart
Journal: Organizational Research Methods
Year published: 2002
Application Articles
Neighborhood influences on the association between maternal age and birthweight
Author(s): M Cerda, SL Buka, JW Rich-Edwards
Journal: Social Science and Medicine
Year published: 2008
Effects of neighbourhood SES and convenience store concentration on individual level smoking
Author(s): Y Chuang, C Cubbin, D Ahn, M Winkleby
Journal: Journal of Epidemiology and Community Health
Year published: 2005
Long-term antipsychotic treatment and brain volumes: a longitudinal study of schizophrenia
Author(s): BC Ho, NC Andreasen, S Zieball, R Pierson ,V Magnotta
Journal: Archives of General Psychiatry
Year published: 2011
The changing distribution and determinants of obesity in the neighborhoods of New York City
Author(s): JL Black, J Macinko
Journal: American Journal of Epidemiology
Year published: 2010
Software
Description: The most recent version of Stata is 13. Stata comes in different versions depending on the size of the data it can handle and processing speed. All versions of Stata have the same features and can be used for multi-level modeling.
Price: Price varies.
Description: SPSS comes in many different versions. SPSS advanced statistics modules are necessary for implementing multi-level modeling in SPSS.
Price: Price varies.
Description: The most recent version of R is version 3.0.2. R is a free software environment for statistical computing and graphics.
Price: Free
Description: The most recent version of HLM is version 7. This software was created specifically for multi-level modeling and can be run from within Stata.
Price: Price varies.
Price: Free if you are a UK academic. Otherwise, price varies.
Statistical Informatics for Cancer Research
Description: This website is hosted by Harvard University’s Program Project in Statistical Informatics for Cancer Research and contains software packages and code relevant to multi-level modeling. This site has mostly R packages and code but some SAS macros are also included.
Price: Free
Websites
The Centre for Multilevel Modelling
Website overview: The Centre for Multilevel Modelling is based at the University of Bristol. This website contains a gallery of multilevel modeling research, videos and presentations related to multi-level modeling, as well as a free on-line course.
Website overview: This is a blog run by Andrew Gelman. He is a professor of statistics at Columbia University and wrote a book entitled, “Data Analysis Using Regression and Multilevel/Hierarchical Models”. His blog very often features posts and discussions around multilevel models.
Courses
Multi-Level Modeling
Host/program: The Epidemiology and Population Health Summer Institute at Columbia University (EPIC)
Next offering: June 6-10, 2016 1:30pm-5:30pm
Course format: In person
Software used: SAS, R, Stata
Multilevel Modeling of Hierarchical and Longitudinal Data Using SAS
Host/program: SAS Institute Inc.
Next offering: none currently scheduled but you can request future course dates
Course format: Both
Software used: SAS
Host/program: Centre for Multilevel Modelling | University of Bristol
Next offering: All course materials available on-line
Course format: Online
Software used: MLwiN, R, Stata
Host/program: The Institute for Statistics Education
Course format: Online
Software used: SAS, R, Stata, SPSS