Dyadic Data Analysis
Overview |
Software |
Websites |
|
Readings |
Courses |
Description
I. Introduction
The effects of social networks on health behaviors and outcomes have been widely documented; these findings may have implications on the design and delivery of health interventions. For example, among persons who inject drugs (PWID), peer-delivered interventions have been proven effective in reducing risk behaviors due to the social influence of injection partners. Additionally, interventions may be more effective if targeted towards a network, rather than an individual. For example, the Break the Cycle intervention1, which has been implemented globally, aims at preventing transition into injection by motivating current PWIDs to not initiate non-injection drug users into injection. Similarly, couple-based HIV prevention efforts among high-risk groups have aimed to reduce risk behaviors, increase HIV testing, and HIV medication adherence and some interventions targeted at the couple-level have been more effective than the same intervention targeted at the individual-level.2
Dyads are a type of social network in which there are two individuals that are linked. The analysis of dyadic data has its origins in psychology in the study of couples and romantic relationships but its methodology has recently emerged in the field of epidemiology. Since dyadic data often violates the assumption non-independence, it has been traditionally analyzed by either using the dyad as the unit of analysis or through standard mixed or marginal models. Using the dyad as the unit of analysis is limited in that: 1) there is no parametric method to examine categorical outcomes, 2) there may not be enough dyads for sufficient power, 3) data is only examined from dyads in which both members are non-missing, and 4) it is limited in examining individual-level predictors. Standard mixed and marginal models are limited in that they typically do not account for the potential effects from a dyadic partner.
In response to these limitations, David Kenny developed the Actor Partner Interdependence Model (APIM)3, the most widely used analytical model of dyadic data in the epidemiological literature. APIM posits that both an individual and his/her dyadic partner can have an effect on the outcome of interest, simultaneously, and these two effects explain the interdependence of outcome errors (Fig. 1). APIM was originally developed for continuous outcomes as an extension of mixed and marginal models that: 1) addresses the non-independence that often occurs in nested data, and 2) disentangles an individual-level effect (called actor effects) from a partner effect. In brief, for an individual-level predictor, there is an actor effect (the effect of an individual’s predictor on that individual’s outcome) and a partner effect (the effect of that same predictor but from the dyadic partner on the individual’s outcome). Since APIM utilizes the entire dyadic sample, each observation is treated as both an actor and a partner. The combination of actor-partner effects explains some of the non-independence of outcome errors within a dyad. In the substance use and HIV literature, actor/partner effects have been found on experiencing intimate partner violence4, HIV risk5, and sexual transmission of HCV.6
There are different methods to conduct APIM, and which method to choose is contingent on various factors. In this review paper, I aim to review the different methods, demonstrate their strengths and limitations, and provide an illustrative example using APIM for both continuous and binary outcomes.
II. Distinguishability
APIM cannot be conducted without an understanding of distinguishability. Distinguishable dyads have a characteristic that differentiates the members within a dyad. This characteristic is the same across all dyads in the study sample. For example, among heterosexual couples, gender is a distinguishing variable. In contrast, indistinguishable dyads have no characteristic that differentiates the member in a dyad. Examples of indistinguishable dyads are identical twins and same-sex couples. In contrast to this ‘theoretical’ definition of distinguishability, others prefer to rely on a more empirical definition when conducting APIM.7 Empirically,complete distinguishability occurs when the following differ across levels of the distinguishing variable: actor effects, partner effects, means/proportions of X, variances of X, Y intercepts, and error variances. In APIM, if dyads are distinguishable then there are methods to account for that distinguishing variable (which will be discussed) that are not done for indistinguishable dyads. There is current debate as to whether or not to empirically test for distinguishability. Kenny et al. believe that distinguishability should always be empirically tested since the variable may not be needed in the model and parsimony can be achieved.7 Other pioneers in the dyadic data analysis field, Gonzalez and Griffin, argue that if a variable can be theoretically thought of to be distinguishable, then an empirical test should not be conducted and the dyads should be assumed to be distinguishable. Mixed models cannot test for complete distinguishability but test for Y distinguishability, which occurs when all the parameters previously mentioned differ except for the X means/proportions and X variances. Y distinguishability is tested through mixed models by comparing deviances of a model where the dyads are assumed to be indistinguishable vs. a model where dyads are assumed to be distinguishable. A significant difference between the models denotes distinguishability and the distinguishable model should be used.
III. APIM requirements
Few requirements are needed for an APIM analysis. First, data must be collected from each members of the dyad. Second, the outcome of interest must be individual-level and it can only be predicted as an actor variable. Lastly, the data must be in the form of a pairwise dataset. A pairwise dataset is one in which there is one observation for each individual and each individual’s observation contains his/her partner’s data as well. An example of a pairwise dataset can be seen in Table 1. There are two rows for each dyad, which represents each individual in the dyad. A pairwise dataset is needed in order to add both individual-specific actor and partner variables in the same model.
IV. Continuous outcomes
APIM methods were originally developed for continuous outcomes. The standard linear mixed model with one individual-level predictor for nested data takes the form:
Where, i indicates the ith observation in the jth group. For APIM, specifically, the equation takes the form:
Where actor_x is the actor effect for variable x and part_x is the partner effect for variable x.There are multiple methods to analyze nested samples. For continuous outcomes, mixed and non-mixed models to conduct APIM for indistinguishable vs. distinguishable dyads differ and will be discussed.
Indistinguishable dyads
In APIM there are two options to examine indistinguishable dyads when the outcome is continuous.7 The first option is multilevel modeling (MLM) in which the non-independence of the outcome errors is specified as a variance. Generally in MLM, the intercept and/or slope can be indicated to vary across dyads and the variances of these random effects are estimated. One limitation in using MLM for APIM is that since the number of individuals within the dyad is small (n=2), indicating a random slope is not recommended. In order to indicate a random slope, there must be more individuals within each dyad then there are random variables; hence in APIM there are not enough dyad members to let the slopes vary across dyads. The consequence of allowing the slopes to vary are biased estimates and confidence intervals for the dyad-level random variance components.7,8 The second limitation in using MLM is that non-independence is assumed to be positive. This is because under MLM, the between-group variance is restricted to be positive; the measure of non-independence, the intraclass correlation (ICC) is the proportion of total variance that is due to between-group variance and hence, will always be positive. In SAS, PROC MIXED can be used for MLM with a random intercept to conduct APIM:
In contrast to MLM, the recommended method for indistinguishable dyads is a mixed model with no random effects (also called a repeated measures model or a marginal model) that take non-independence into account. Under this method, non-independence is not assumed to be positive. In SAS, this is performed by replacing the RANDOM statement with a REPEATED statement that notes that observations are repeated within a dyad. Additionally, a covariance structure to model the covariance patterns across observation has to be indicated. For indistinguishable dyads, the residual outcome variances are assumed to be the same across members of a dyad so a compound symmetry covariance structure is chosen:
repeated / type = cs subject = dyadid
Whether MLM or repeated measures is used, when conducting APIM for indistinguishable dyads the final results will give you one overall actor effect and one overall partner effect.
Distinguishable dyads
When analyzing distinguishable dyads, MLM is not a suggested analytical method to use because it assumes homogeneity of variances, which is violated since that the residual outcome variances across members of a dyad are assumed to be unequal. In MLM, if variances vary as a function of a level-1 predictor the model may be misspecified and incorrect parameter estimates and standard errors may result.9 One option is to conduct a repeated measures analysis but to include interactions between the distinguishing variable and the actor-partner effects and to indicate heterogeneous compound symmetry (which allows the variances to differ across dyad members). This is called the interaction approach which takes the general form of:
where dist is the distinguishing variable. The corresponding SAS code is:
Since there is an interaction term between actor-partner effects and the distinguishing variable, the final results will give a test of whether the effects of the actor-partner predictor differ significantly across the distinguishing variable. However, usually the goal of the analysis is to obtain the actor-partner effects per level of the distinguishing variable.
In order to obtain the actor-partner effects per level of the distinguishing variable, the two-intercept approach is used.7,10–12 The two-intercept model is a novel method in which the model includes an intercept for each level of the distinguishing variable. Hence, dummy variables for the distinguishing variable are coded and both dummy variables are included in the model separately and as interaction terms with the actor-partner effects. However, since the correlation of the two dummy variables is -1, they only way they can be included in the same model is by dropping the intercept from the model.7 In brief, the two-intercept model can be described as combining the models for both levels of the distinguishing variable. For example, if the distinguishing variable is gender, then two dummy variables for gender are coded (female = 1 if female, 0 if male and male=1 if male, 0 if female). The separate APIM models for female and males are:
Among females:
Among males:
The “combined” two-intercept model is then:
Where is the intercept for males and is the intercept for females. Since male and female are dummy variables:
β1m = actor effect among males
β1f = actor effect among females
β2m = partner effect among males
β2f = partner effect among females
The corresponding SAS code for this model is:
Where noint specifies no intercept.
To conclude, the method chosen to conduct APIM with a continuous outcome among distinguishable dyads is solely based on the goals of the analysis. If one is interested in the interaction between the actor-partner predictor and the distinguishing variable, then the interaction approach should be used. However, if the goal is to ascertain the actor-partner effects for each level of the distinguishing variable, then the two-intercept model is preferred.
V. Binary outcomes
APIM methods for binary outcomes have only recently been developed and utilized. The general logistic equation for nested data with one individual-level predictor takes the form:
In APIM specifically the equation is:
Mixed models
Conducting APIM for binary outcomes was first demonstrated in 2006 through the use of MLM with random intercepts only; namely through PROC NLMIXED13 and PROC GLIMMIX.8 NLMIXED and GLIMMIX differ in their estimation of APIM models. Similar to linear MLMs, NLMIXED assumes positive non-independence and that the variances across members of a dyad are equal (which assumes indistinguishability). Additionally, PROC NLMIXED estimates its parameters through the maximum likelihood (ML) methods as opposed to restricted maximum likelihood methods (REML) that are used in linear mixed models. REML adjusts standard errors of any level-2 random effects and provides better estimates than ML when the number of dyads is small. Lastly, unlike PROC MIXED in which a repeated measures marginal model can be conducted instead by replacing the RANDOM statement with a REPEATED statement, the same does not apply for PROC NLMIXED. Given these limitations, NLMIXED is not a preferred method for APIM analyses.
GLIMMIX as an APIM method for binary outcomes is more flexible than NLMIXED. If a MLM is desired, then parameter approximation through ML methods using Laplace approximation can be indicated. This does not restrict the interdependence in the outcome to be positive, which is a strength of GLIMMIX over NLMIXED. Additionally, for distinguishable dyads, although it is not possible to indicate a heterogeneous compound symmetry structure, an unstructured covariance structure can be indicated. Robust variance-covariance estimators for the fixed effects can then be chosen to make the analysis robust to the choice of the covariance structure by indicating EMPIRICAL in the PROC statement (which is not available in PROC NLMIXED).However, a simulation study showed that using GLIMMIX with a random intercept produced biased estimates of actor-partner effects13 and a separate simulation showed that it overestimates the variance of the intercept , except for sample sizes of less than 50 dyads, where it underestimated the effect.14
Non-mixed models
A simulation study conducted in 2013 showed that non-mixed models through the use of GLIMMIX and GENMOD are better in estimating actor-partner effects for binary outcomes than mixed model.13 Spain et al. (2012) showed how PROC GLIMMIX can be used to run a “marginalized” model with no random effects, by indicating RESIDUAL in the RANDOM statement14:
For distinguishable dyads, a two-intercept model can be examined via the NOINT option in the MODEL statement.
Generalizing estimating equations (GEE) can also be used to account for the non-independence of the correlated outcome errors in dyads when the outcome is binary.11,13 Since GEE is a marginal approach, it does not model the non-independence as a variance, which means it does not assume that the non-independence is positive; instead it accounts for the non-independence. A working correlation structure is chosen (compound symmetry for indistinguishable dyads, unstructured for distinguishable dyads) that are robust to misspecification if robust standard errors are estimated (the default option). Compound symmetry working correlation structure assumes that within the dyad, the two members’ observations are equally correlated, but there are no correlations between members from different dyads. The unstructured working correlation structure places no restrictions on correlations. PROC GENMOD and PROC GLIMMIX differ by their estimation approach; GENMOD estimates the covariance parameters by the method of moments, whereas the GLIMMIX procedure uses likelihood-based techniques.15 A limitation of GEE in APIM is that since it uses ML parameter estimation, it provides biased actor-partner effects when the number of dyads is small (n<50). Another limitation, which is also a limitation in GLIMMIX, is that likelihood test statistics are not performed, so comparing nested models and conducting tests of distinguishability is not possible.
For indistinguishable dyads the SAS code is:
For distinguishable dyads, where gender is the distinguishing variable the two-intercept model SAS code is:
The 2013 simulation study demonstrated that both methods gave unbiased estimates of actor-partner effects when the number of dyads was greater than 50, but GLIMMIX estimated the ICC with serious negative bias.13 It was therefore concluded that GEE should be the preferred method for APIM models with binary outcomes.
VI. Additional steps and implications
Before conducting an APIM it may be useful to know some additional steps that can be taken in conducting analysis (i.e. interaction effects) and some conceptual implications of the findings.
Interaction effects
Interaction effects between the actor variable and partner variable can be tested in APIM. An actor-partner interaction provides evidence of an effect of dyad similarity on an outcome.16 For example, in the forthcoming illustrative example, individual drug use is the predictor of interest and depressive symptomology is the outcome. In APIM, an actor-partner interaction term tests if those who use drugs and have a partner who use drugs are more likely to have depressive symptomology, compared to those who do not use drugs and do not have a partner who uses drugs.
Additionally, in APIM, other actor, partner and dyad-level covariates can be added in the model, and interactions between these terms and the actor-partner predictors can be tested for. Such examples are partner-moderated partner effects (interaction between partner effect and a partner characteristic), actor-moderated partner effects (interaction between partner effect and an actor characteristic), actor-moderated actor effects (interaction between actor effect and actor characteristic), and partner-moderated actor effects (interaction between actor effect and partner characteristic).17
Implications of findings
When both actor and partner effects are found for a specific predictor, that predictor is said to have a bidirectional effect on the outcome.17 This result gives evidence to interpersonal influences in the causal pathway of interest. In the example of drug use and depressive symptomology among couples, a bidirectional effect would suggest targeting the couple’s drug use as a unit as a means to prevent and/or treat depression. If only an actor effect or partner effect are found then the relationship follows an actor-only or partner-only pattern.18 In the example of drug use and depressive symptomology among couples, an actor-only pattern would suggest targeting the individual’s drug use, whereas a partner-only pattern can suggest an intervention that aides an individual in coping with their partner’s drug use.
When a bidirectional effect is found, sometimes it is of interest to examine the relative sizes of the actor-partner effects. For example, it may be that the actor-partner effects are equal to each other and in the same direction. This pattern, called the couple pattern18, implies that the individual-level predictor can be categorized as a dyad-level predictor that captures the additional partner effect (i.e. both partners use drugs vs. one partner uses drugs vs. no partners use drugs) on the outcome. Conversely, the contrast pattern18 occurs when actor-partner effects are equal in size but have opposite signs. For example, depressive symptomology may be negatively associated with actor drug use but positively associated with partner drug use. It may be that drugs may be used as a coping mechanism and therefore may have a negative effect on depressive symptomology, whereas the stress associated with having a partner who uses drugs may have a positive effect on depressive symptomology. A contrast pattern also implies that that the predictor should be tested as a dyad-level variable and can be seen as the difference in individual predictors. In this example, the dyad-level variable would categorized as: both or neither partner uses drugs vs. actor uses drugs only vs. partner uses drugs only). The relative sizes of the actor-partner effects can be empirically tested to confirm couple or contrast patterns.7 To empirically test the actor-partner effects' relative size to each other, an APIM model is conducted where the only two predictors are the couple-pattern variable and the contrast-pattern variable. If one variable is significant and the other is not, then there is empirical evidence for that respective pattern.
VII. Limitations of APIM
There are limitations of APIM that are worth discussing. First, dyad members may come from a larger group or social network which can also contribute to the interdependency of the outcome errors. In this case, APIM estimates may be biased due to this larger group interdependency and other social network methods may be more optimal. Secondly, the APIM model may not be the correctly specified model for structure of the association of interest. Two other dyadic models exist: the common-fate model and the mutual feedback model.3 In the common-fate model, dyad members do not influence each other and the interdependency in the outcome errors is due to an outside factor that influences both dyad members. For example, in the illustrative example, correlation in depressive symptomology could be due to a traumatic event that affected both dyad members. The mutual feedback model assumes that an individual’s outcome affects his/her partner’s outcome and vice-versa. For example, depressive symptomology in one dyad member can have an effect in their partner’s depressive symptomology and this could be the cause of the interdependency in the outcome errors. Multilevel methods have not been developed to estimate the common-fate or mutual feedback models.