Multi-Level Modeling

Overview	Software
Description	Websites
Readings	Courses

Overview

Most of our statistical models rely on the assumption that each observation is independent. However, individual health can be “clustered” due to the influence of shared contexts or “contagious” due to the transmission of ideas or pathogens, violating assumptions of independence. This non-independence may be of direct interest, or merely a nuisance causing our standard errors to be incorrect.

The non-independence among real people can often be ignored in sparse samples. A study of 100 randomly samples US adults is unlikely to include more than 1 person from any family, neighborhood, workplace, or clinic. This sample includes such a small fraction of the total population, that any individual’s close contacts are unlikely to be sampled. Whatever generalized linear model we apply to this sample, the residuals are likely to be approximately independent, though we may still be interested to adjust for confounders at the individual or group-level.

On the other hand, dense samples and those that incorporate groups into the recruitment strategy warrant special attention. A dense sample including a large segment of the population (e.g., a sample of 100 epidemiology students at Columbia) is much more likely to include people who share health-relevant contexts or who influence each others’ behaviors and exposures. Identifying groups within the population (e.g., cohort based on year of program entry, degree program, affiliated departmental cluster) may help to address this non-independence. A complex sample (e.g., a sample of 100 epidemiology students in each of the top 10 schools of public health) may incorporate group identification directly into the recruitment strategy. Even after the individual- and group-level predictors are entered into a model to predict individual health, the residuals may be correlated among individuals in the same group. Mixed models (aka random effects models or multilevel models) are an attractive option for working with clustered data, and should be considered alongside alternatives such as generalized estimating equations.

Description

Before we move on to consider how to implement such approaches, let’s introduce a couple of terms to help describe multilevel data structures. We traditionally label the level at which the outcome is assessed as “Level 1”. Level 1 typically corresponds to either the individual person, or to a single measurement on that person (e.g., one study visit, one tooth). These Level 1 observations are grouped into mutually exclusive and collectively exhaustive categories at “Level 2” for a 2-level model. Correct specification of Level 2 groups (e.g., neighborhoods, schools, hospitals) is critical, since we typically assume that these are stable, meaningful groups; that the groups are independent of each other (e.g. no residual correlation between adjacent groups); and that there is no unspecified covariance structure within the Level 2 groups (e.g. there are no intermediate subgroups between Level 1 and Level 2 relevant to predicting the health outcome, once you account for measured predictors). If each Level 2 group has the same number of Level 1 observations, the study sample is said to be “balanced”. The term “grand mean” is introduced to distinguish the mean for the entire sample (grand mean) from the mean within group j (group mean). The equations shown above can be combined:

Multilevel Modeling with Random Intercepts

Let’s build up to multilevel models. The simplest generalized linear model has a linear outcome and no predictors. The expected value of the outcome is simply the intercept. The observed outcomes are modeled as the intercept plus a normally distributed error term.

E(Y) = intercept
Y = intercept + IndividualError (where IndividualError has a mean of 0 and an estimated standard deviation)

Equivalently, the simplest random intercept model is an “empty” model with no predictors. Again, we’ll assume a linear outcome and normally distributed error term. We introduce the subscripts i to index Level 1 and j to index Level 2. When a subscript j (but not i) is used for a parameter, this indicates that the parameter is allowed to vary across clusters, but is constant for all observations within a cluster.

E(Yij) = interceptj
<span”> Yij = interceptj + IndividualError
interceptj = GrandMean + InterceptError

So, the random intercept model introduces the group-specific intercept, which is modeled with it’s own error term. In fact, it’s really because of the error term (labeled here as “InterceptError”) that we call this a random intercept model. The errors are (usually) assumed to have a normal distribution, and both are assigned a mean of zero, and their own estimated standard deviation.

Yij = GrandMean + InterceptError + IndividualError

This empty random intercept model is particularly useful when exploring the data since it allows us to estimate how much of the outcome variation is happening between versus within groups. If groups are very different from each other, many groups will be far from the GrandMean, making the InterceptError have a large standard deviation. If observations are very similar within groups, the observed outcomes will be close to the group mean, and the standard deviation of IndividualError will be small. A convenient metric is the intraclass correlation coefficient (ICC), calculated by from the variance (recall that variance is the square of the standard deviation) for InterceptError and IndividualError. The ICC has a theoretical range of 0 to 1, and can be interpreted as the proportion of outcome variance that can potentially be explained by measured and unmeasured characteristics of Level 2 groups.

However, the empty model is generally not our primary focus. We would like to add individual and group-level predictors, and get appropriate estimates of association. To begin, let’s add the Level 1 predictor Age.

Yij = interceptj + β×Agei + IndividualError
interceptj = ExpectedOutcomeAtAgeZero + InterceptError

As above, these can be combined to a single equation with two error terms

Yij = ExpectedOutcomeAtAgeZero + β×Agei + InterceptError + IndividualError

However, we need to be cautious about our interpretation. The intercept in a generalized linear model tells us about the expected outcome when the predictors are equal to zero. We may not emphasize the interpretation of the intercept, but in a random intercept model we are giving greater attention to the intercept. The intercept may be meaningless for a variable like age in samples that do not include neonates, since estimating the average outcome at age zero requires extrapolating beyond the age range of our sample. Consequently, it is good practice in random intercept models to grand mean center continuous variables for which zero is not a plausible value. Grand mean centering involves creating a new variable that is linearly related to the original variable, but has a mean of zero (e.g., CenteredAge = Age – MeanAge). Group mean centering is an alternative with the new variable having a mean of zero within each group (e.g., GroupCenteredAge = Age – MeanAgeinGroupj), and is particularly useful in contexts where exposure relative to the group mean (rather than on the original scale), or having exposures atypical for one’s group, are of particular interest.

One of the key motivations for working with multilevel data may be to examine one or more Level 2 characteristics as predictors. For simplicity, let us define a dichotomous level 2 variable called Exposedj that is 1 for exposed groups and 0 for unexposed groups.

Yij = interceptj + β×Agei + IndividualError
interceptj = ExpectedOutcomeAtAgeZeroForUnexposedGroup + γ×Exposedj + InterceptError

These can be combined to a single equation with two slopes and two error terms

Yij = ExpectedOutcomeAtAgeZeroForUnexposedGroup + γ×Exposedj + β×Agei+ InterceptError +IndividualError

The above notation is somewhat clunky, so in most cases we convert to using multiple β and γ symbols with subscripts to indicate first their order within the equation, then whether they are group specific (indicated by j) or for the entire sample (indicated by 0 or .).

Yij = γ00 + γ10×Exposedj + β10×Agei+ InterceptError +IndividualError

Multilevel Modeling with Random Slopes

We are only going to briefly address random slopes, but it is worth spending a moment to address what these models are particularly well-suited to address: cross-level interactions. Cross-level interactions represent modification of Level 1 associations by Level 2 clusters or characteristics.

Building on the above notation we would usually assume the slope for age to be held constant, a single parameter estimated for the entire population labeled as β10. However, the effect of a Level 1 characteristic such as age can instead be allowed to vary across groups, modeled as β1j. We can then investigate whether the association of age with our health outcome is steeper in some groups than others. For example, physical activity declines with age among adolescents, but perhaps some physical activity resources and programs at the school-level are able to attenuate the decline. We could use such school level characteristics to predict the strength of the association between age and physical activity. This would show up as yet another equation with estimated parameters and an error term, and the SlopeError would be assumed to have a mean of zero and normal distribution, as was the case for our other error terms.

Tips for Multilevel Modeling

Multilevel models may not be worth the extra effort if you have very few observations per cluster, or a very low ICC
When in doubt, grand mean center continuous predictors to make zero a meaningful value if you are working with a random intercept model.
In a random intercept model, the cluster-specific intercepts are modeled as having a mean and variance, which in turn can be used to generate the Best Linear Unbiased Prediction (BLUP or eBLUP) for each cluster if you are interested in the expected outcome for each group. The BLUP-based predicted outcomes are sometimes called “shrinkage” estimates because they are “shrunk” toward the mean intercept for clusters with few observations.
In a random intercept model, the coefficients for Level 1 predictors can be interpreted as conditional on cluster (e.g., g(y) increases by β for each 1 unit increase in X, net of cluster and any adjustments).
In a random intercept model, the coefficients for Level 2 predictors can be interpreted as predictors of the cluster-specific intercept (e.g., the expected outcome is γ units higher for each 1 unit increase in Xj when all Level 1 predictors are held at 0).
A random slope model usually should also include a random intercept.
A random slope model can be used to test whether the coefficient for a Level 1 predictor varies across groups, and whether Level 2 measures predict the strength of that slope.

Readings

Textbooks & Chapters

Gelman A, and Hill J. Multilevel structures. In: Data Analysis Using Regression and Multilevel/Hierarchical Models. New York City: Cambridge University Press, 2007:chapter 11.

Raudenbush, S. W. and Bryk, A.S. (2001). Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park: SAGE Publications. See also Bryk, A. S. and Raudenbush, S. W. (1992). Hierarchical Linear Models. Newbury Park: Sage Publications.

Diggle, P.J., Heagerty, P., Liang K-Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data (second edition). Oxford: Oxford University Press
(Diggle also has geostatistical and longitudinal lectures and materials athttp://www.lancs.ac.uk/~diggle/ )

Singer JD, Willet JB. Applied Longitudinal Data Analysis. Oxford U Press. 2003.

Free online:
http://www.soziologie.uni-halle.de/langer/multilevel/books/goldstein.pdf
http://joophox.net/publist/amaboek.pdf (authors note material is dated)

Methodological Articles

A brief conceptual tutorial on multilevel analysis in social epidemiology
Author(s): J Merlo, B Chaix, M Yang, J Lynch and L Rastam
Journal: Journal of Epidemiology and Community Health
Year published: 2005

Multilevel analysis in public health research
Author(s): AV Diez-Roux
Journal: Annual Review of Public Health
Year published: 2000

When can group level clustering be ignored? Multilevel models versus single-level models

Author(s): P Clarke
Journal: Journal of Epidemiology and Community Health
Year published: 2008

The (mis)estimation of neighborhood effects: causal inference for a practicable social epidemiology

Author(s): JM Oakes
Journal: Social Science and Medicine
Year published: 2004

Estimating intraclass correlation for binary data

Author(s): MS Ridout, CG Demetrio, D Firth
Journal: Biometrics
Year published: 1995

A glossary for multilevel analysis

Author(s): AV Diez-Roux
Journal: Journal of Epidemiology and Community Health
Year published: 2002

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

Author(s): C Zorn
Journal: Political Research Quarterly
Year published: 2006

Comparing GEE and Robust Standard Errors for Conditionally Dependent Data

Author(s): C Zorn
Journal: Political Research Quarterly
Year published: 2006

To GEE or not to GEE: comparing population average and mixed models

Author(s): AE Hubbard, J Ahern, NL Fleischer, et al.
Journal: Epidemiology
Year published: 2010

Modeling neighborhood effects: the futility of comparing mixed and marginal approaches

Author(s): S Subramanian, AJ O’Malley
Journal: Epidemiology
Year published: 2011

Software/Programming Articles

Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and Individual Growth Models

Author(s): JD Singer
Journal: Journal of Educational and Behavioral Statistics
Year published: 1998

Some applications of generalized linear latent and mixed models in epidemiology

Author(s): A Skrondal, S Rabe-Hesketh
Journal: Norwegian Journal of Epidemiology
Year published: 2003

Growth Modeling Using Random Coefficient Models: Model Building, Testing, and Illustrations

Author(s): PD Bliese, RE Ployhart
Journal: Organizational Research Methods
Year published: 2002

Application Articles

Neighborhood influences on the association between maternal age and birthweight
Author(s): M Cerda, SL Buka, JW Rich-Edwards
Journal: Social Science and Medicine
Year published: 2008

Effects of neighbourhood SES and convenience store concentration on individual level smoking

Author(s): Y Chuang, C Cubbin, D Ahn, M Winkleby
Journal: Journal of Epidemiology and Community Health
Year published: 2005

Long-term antipsychotic treatment and brain volumes: a longitudinal study of schizophrenia

Author(s): BC Ho, NC Andreasen, S Zieball, R Pierson ,V Magnotta
Journal: Archives of General Psychiatry
Year published: 2011

The changing distribution and determinants of obesity in the neighborhoods of New York City

Author(s): JL Black, J Macinko
Journal: American Journal of Epidemiology
Year published: 2010

Software

SAS
Description: The most recent version of SAS is 9.4. SAS must be purchased through your business or educational institution. They do not sell licenses to individuals. Starting with SAS version 9.3 all procedures necessary for basic multilevel models come with base SAS.
Price: Price varies.

Stata

Description: The most recent version of Stata is 13. Stata comes in different versions depending on the size of the data it can handle and processing speed. All versions of Stata have the same features and can be used for multi-level modeling.
Price: Price varies.

SPSS Statistics

Description: SPSS comes in many different versions. SPSS advanced statistics modules are necessary for implementing multi-level modeling in SPSS.
Price: Price varies.

Description: The most recent version of R is version 3.0.2. R is a free software environment for statistical computing and graphics.
Price: Free

HLM

Description: The most recent version of HLM is version 7. This software was created specifically for multi-level modeling and can be run from within Stata.
Price: Price varies.

MLwiN

Price: Free if you are a UK academic. Otherwise, price varies.

Statistical Informatics for Cancer Research
Description: This website is hosted by Harvard University’s Program Project in Statistical Informatics for Cancer Research and contains software packages and code relevant to multi-level modeling. This site has mostly R packages and code but some SAS macros are also included.

Price: Free

Websites

The Centre for Multilevel Modelling
Website overview: The Centre for Multilevel Modelling is based at the University of Bristol. This website contains a gallery of multilevel modeling research, videos and presentations related to multi-level modeling, as well as a free on-line course.

Statistical Modeling, Causal Inference, and Social Science

Website overview: This is a blog run by Andrew Gelman. He is a professor of statistics at Columbia University and wrote a book entitled, “Data Analysis Using Regression and Multilevel/Hierarchical Models”. His blog very often features posts and discussions around multilevel models.

Courses

Multi-Level Modeling
Host/program: The Epidemiology and Population Health Summer Institute at Columbia University (EPIC)
Next offering: June 6-10, 2016 1:30pm-5:30pm
Course format: In person
Software used: SAS, R, Stata

Multilevel Modeling of Hierarchical and Longitudinal Data Using SAS

Host/program: SAS Institute Inc.
Next offering: none currently scheduled but you can request future course dates
Course format: Both
Software used: SAS

Online Multilevel Modelling Course

Host/program: Centre for Multilevel Modelling | University of Bristol
Next offering: All course materials available on-line
Course format: Online
Software used: MLwiN, R, Stata

Mixed and Hierarchical Linear Models

Host/program: The Institute for Statistics Education
Course format: Online
Software used: SAS, R, Stata, SPSS

Join the Conversation

Have a question about methods? Join us on Facebook

JOIN