Focused Working Groups

Causal Inference Learning Group 

The Causal Inference Learning Group was formed in 2018. The mission of the group is the dissemination of cutting-edge causal inference research at the Columbia Mailman School of Public Health and to encourage the flourishing of interdisciplinary collaborations in the development and application of causal inference approaches to improve public health research and decision making.

The working group holds bi-monthly meetings where the members present their ongoing research projects or interesting articles. The group includes faculty, postdocs, and students from the Departments of Biostatistics, Epidemiology, Health Policy, Environmental Health Sciences, Political Science, Statistics, Computer Science, Ecology, Evolution, and Environmental Biology and Teachers College. Faculty, students, and post-doctoral fellows from other universities in the New York area have joined the group as well starting in 2019. 

Organizing Faculty: Linda ValeriCaleb MilesDaniel Malinsky

Key Publications:

  • D. Malinsky, I. Shpitser, and E. J. Tchetgen Tchetgen (2020+) “Semiparametric Inference for Non-monotone Missing-Not-at-Random Data: the No Self-Censoring Model.” Journal of the American Statistical Association.
  • R. Bhattacharya, D. Malinsky, and I. Shpitser (2019) “Causal Inference Under Interference and Network Uncertainty.” In Proceedings of the 35th Conference on Uncertainty in Artificial Intelligence (UAI).
  • R. Nabi, D. Malinsky, and I. Shpitser (2019) “Learning Optimal Fair Policies.” In Proceedings of the 36th International Conference on Machine Learning (ICML).
  • D. Malinsky, I. Shpitser, and T. S. Richardson (2019) “A Potential Outcomes Calculus for Identifying Conditional Path-Specific Effects.” In Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS).
  • Miles, C.H., Shpitser, I., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E. J. (2020). On the semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding. Biometrika, 107(1), 159-172.
  • Miles, C.H., Petersen, M., and van der Laan, M.J. (2019). Causal inference when counter- factual depend on the proportion of all subjects exposed. Biometrics 75(3), 768-777.
  • Miles, C.H., Schwartz, J., and Tchetgen Tchetgen, E.J. (2018). A class of semiparametric tests of treatment effect robust to confounder measurement error. Statistics in Medicine, 37(24), 3403-3416.
  • Miles, C.H., Shpitser, I., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E.J. (2017). Quantifying an adherence path-specific effect of antiretroviral therapy in the Nigeria PEPFAR program. Journal of the American Statistical Association, 112(520), 1443-1452.
  • Miles, C.H., Kanki, P., Meloni, S., and Tchetgen Tchetgen, E.J. (2017). On partial identification of the natural indirect effect. Journal of Causal Inference, 5(2).
  • Comment L, Coull BA, Zigler C, Valeri L. (2021). Bayesian data fusion for unmeasured confounding.  Biometrics, in press.
  • Zhu Y, Jackson J, Centorrino F, Fitzmaurice GM, Valeri L. Meta-analysis of the total effect decomposition in the presence of multiple mediators: Integrating evidence across trials for schizophrenia treatment. (2020).  Epidemiology, in press.
  • Devick KL, Valeri L, Chen J, Jara A, Bind MA, & Coull BA. The Role of Body Mass Index at Diagnosis on Black-White Disparities in Colorectal Cancer Survival: A Density Regression Mediation Approach.  (2020). Biostatistics, in press.
  • Bellavia A, Valeri L. Decomposition of the Total Effect in the Presence of Multiple Mediators and Interactions. Am J Epidemiol. 2018 06 01; 187(6):1311-1318.
  • Valeri L, Reese SL, Zhao S, Page CM, Nystad W, Coull BA, London SJ. Misclassified exposure in epigenetic mediation analyses. Does DNA methylation mediate the effects of smoking on birth weight? Epigenomics. 2017 03; 9(3):253-265. 
  • Valeri L, Coull BA. Estimating causal contrasts involving intermediate variables in the presence of selection bias. Statistics in Medicine. 2016; 35(26):4779-4793.
  • Valeri L, Chen JT, Garcia-Albeniz X, Krieger N, VanderWeele TJ, Coull BA. The role of stage at diagnosis in colorectal cancer Black-White survival disparities: A counterfactual causal inference approach. Cancer Epidemiol Biomarkers Prev. 2016; 25(1):83-89.
  • Bobb JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 2015; 16(3):493-508.
  • Valeri L, VanderWeele TJ. The estimation of direct and indirect causal effects in the presence of the misclassified binary mediator. Biostatistics. 2014; 15(3):498-512. 
  • Valeri L, VanderWeele TJ. Mediation analysis allowing for exposure-mediator interaction and causal interpretation: Theoretical assumptions and implementation with SAS and SPSS macros. Psychol Methods. 2013; 18(2):137-150.

Neuroimaging Journal Club 

We work on developing statistical methods for biomedical data for mental health research. The research scope is broad, including neuroimaging (e.g. MRI, PET, and EEG) and Genetics (DNA methylation, GWAS). Our group collaborates with various scientists in the Department of Psychiatry, Neurology, OB/GYN, Pediatrics at CU and NYSPI. Our group actively works on traditional machine learning, AI, and deep learning techniques.

If you have any questions about our group, research, or joining the weekly journal club meetings (Fridays 9:30-10:30 am), please contact Dr. Seonjoo Lee.

Organizing Faculty: Seonjoo Lee
Participating Faculty: Xi Zhu, Jordan Dworkin
Group website: LeeLab

Key Publications:

  • Lee S., Shen H. and Truong Y. (2020) Sampling properties of color Independent Component Analysis. Journal of multivariate analysis. Journal of Multivariate Data Analysis. https://doi.org/10.1016/j.jmva.2020.104692 
  • Simon SS, Lee S, Stern Y. Personality-cognition associations across the adult life span and potential moderators: Results from two cohorts. J Pers. 2020 Oct;88(5):1025-1039. doi: 10.1111/copy.12548. Epub 2020 Apr 4. PubMed PMID: 32199032; PubMed Central PMCID: PMC7484019.
  • Park, H., & Lee, S. (2019). Logistic regression error‐in‐covariate models for longitudinal high‐dimensional covariates. Stat, 8(1), e246.
  • Lee S, Zhou X, Gao Y, Vardarajan B, Reyes-Dumeyer D, Rajan KB, Wilson RS, Evans DA, Besser LM, Kukull WA, Bennett DA, Brickman AM, Schupf N, Mayeux R, Barral S. Episodic memory performance in a multi-ethnic longitudinal study of 13,037 elderly. PLoS One. 2018;13(11):e0206803. doi: 10.1371/journal.pone.0206803. eCollection 2018. PubMed PMID: 30462667; PubMed Central PMCID: PMC6248922.
  • Lee S, Zimmerman ME, Narkhede A, Nasrabady SE, Tosto G, Meier IB, Benzinger TLS, Marcus DS, Fagan AM, Fox NC, Cairns NJ, Holtzman DM, Buckles V, Ghetti B, McDade E, Martins RN, Saykin AJ, Masters CL, Ringman JM, Fӧrster S, Schofield PR, Sperling RA, Johnson KA, Chhatwal JP, Salloway S, Correia S, Jack CR Jr, Weiner M, Bateman RJ, Morris JC, Mayeux R, Brickman AM. White matter hyperintensities and the mediating role of cerebral amyloid angiopathy in dominantly-inherited Alzheimer's disease. PLoS One. 2018;13(5):e0195838. doi: 10.1371/journal.pone.0195838. eCollection 2018. PubMed PMID: 29742105; PubMed Central PMCID: PMC5942789.

ROADMAP (Research on Adaptive Designs for Mobile App Platform)

The focus of the ROADMAP has been on three methodology areas: adaptive designs, implementation science, and statistical and machine learning.  We aim to harness these different methodologies and their intersections to achieve our goal of better app development and evaluation as well as the design of implementation studies to improve health. 

Organizing Faculty: Min Qian: mq2158@cumc.columbia.edu
Participating Faculty: Ken Cheung and Bin Cheng 
Group website

Key Publications:

  • Hu X., Qian M., Cheng B., and Cheung Y.K. (2020). Personalized policy learning using longitudinal mobile health data. Accepted by Journal of the American Statistical Association.
  • Zhong X., Cheung Y.K., Qian M., and Cheng B. (2020). Comparing adaptive interventions under a general sequential multiple assignments randomized trial design via multiple comparisons with the best. Accepted Journal of Statistical Planning and Inference.
  • Qian M., Chakraborty B., Maiti R., and Cheung Y.K. (2020). A sequential significance test for treatment by covariate interactions. Accepted by Statistica Sinica.
  • Zhong X., Cheng B., Qian M., and Cheung Y.K. (2019). A gate-keeping test for selecting adaptive intervention under general designs of sequential multiple assignment randomized trials. Contemporary Clinical Trials, 85:105830.
  • Cheung, K., Ling, W., Karr, C. J., Weingardt, K., Schueller, S. M., & Mohr, D. C. (2018). Evaluation of a recommender app for apps for the treatment of depression and anxiety: an analysis of longitudinal user engagement. Journal of the American Medical Informatics Association.

Functional Data Working Group (FDAWG)

We work on statistical methods for and applications of functional data analysis defined broadly. Recent projects include applications in neuroimaging, accelerometry, and motor learning, among many others. Our methodological work spans a wide range, including alignment, dimension reduction, clustering, regression, and software development. Our general goal is to develop and use statistically sound methods to address public health and scientific questions that depend on complex data sources for answers.

Organizing Faculty: Jeff Goldsmith, Todd Ogden
Group Website

Key Publications:

  • Petkova E, Tarpey T, Ciarleglio A, and Ogden RT. “Extracting scalar measures from functional data with applications to the placebo response,” in the press, Statistics, and its Interface. 
  • Wrobel J, Martin ML, Bakshi R, Calabresi PA, Elliot M, Roalf D, Gur RC Gur RE, Henry RG, Nair G, Oh J, Papinutto N, Pelletier D, Reich DS, Rooney WD, Satterthwaite TD, Stern W, Prabhakaran K, Sicotte NL, Shinohara RT, Goldsmith J (2020). Intensity warping for multisite MRI harmonization.  NeuroImage 223:117242,
  • Leez J, Li G, Christensen WF, Collins G, Seeley M, Bowden AE, Fullwood DT, and Goldsmith J (2019). Functional Data Analyses of Gait Data Measured Using In-Shoe Sensors. Statistics in Biosciences. 11: 288-313. 
  • Wang Y, Wang G, Wang L, and Ogden RT (2020). Simultaneous confidence corridors for mean functions in functional data analysis of imaging data. Biometrics 76: 427–437.
  • Chen Y, Goldsmith J, and Ogden RT (2019). Functional data analysis of dynamic PET data. Journal of the American Statistical Association 14: 595–609. Wrobel J, Zipunnikov V, Schrack J, and Goldsmith J (2019). Registration for exponential family functional data. Biometrics 75: 48-57. 
  • Backenroth D, Goldsmith J, Harran MD, Cortes JC, Krakauer JW, and Kitago T. (2018). Modeling motor learning using heteroskedastic functional principal components analysis. Journal of the American Statistical Association 113: 1003-1015. 
  • Ciarleglio A, Petkova E, Ogden RT, and Tarpey T (2018). Constructing treatment decision rules based on scalar and functional predictors when moderators of treatment effect are unknown. Journal of the Royal Statistical Society, Series C 67: 1331–1356 
  • Goldsmith J and Schwartz JE (2017).  Variable selection in the functional linear concurrent model.  Statistics in medicine 36 2237-2250.
  • Jiang B, Petkova E, Tarpey T, and Ogden RT (2017). Latent class modeling using matrix-valued covariates with application to identifying early responders based on EEG signals. Annals of Applied Statistics 11: 1513–1536. 
  • Reiss PT, Goldsmith J, Shang HL, and Ogden RT (2017). Methods for scalar-on-function regression. International Statistical Review 85: 228–249.
  • Chang C, Lin X, and Ogden RT (2017). Simultaneous confidence bands for functional regression models. Journal of Statistical Planning and Inference 188: 67–81. 

Missing Data Reading Group 

Organizing Faculty: Qixuan Chen


Genomics, Epigenomics, and Mutil-omics 

Organizing Faculty: Shuang Wang


Precision Medicine Working Group

The PM working group discusses emerging research progress in precision medicine with applications to mental health, neuropsychiatry disorders, and digital health. The topics include machine learning, optimization of individualized treatment strategies, high dimensional data analysis, causal inference, predictive modeling, and data integration.

Organizing Faculty: Yuanjia Wang
Group Website


Statistical Genetics and Genomics (SGG) Working Group

Established in 2022, the Statistical Genetics and Genomics (SGG) Working Group aims to enhance our collective knowledge in cutting-edge statistical genetics and genomics research. Located in the Department of Biostatistics at Columbia Mailman School of Public Health, we seek to foster interdisciplinary collaborations between the Department of Biostatistics and the broader community of the School of Public Health. The working group brings together faculty, fellows, students, and visitors from related fields, including biostatistics, statistics, computational biology, genetics, engineering, medicine, etc. from inside and outside Columbia, interested in gaining a better understanding of how genomes and genes affect biological functions and how to translate this knowledge to improving human health. The group hosts bi-weekly seminars with speakers from both inside and outside Columbia to discuss the latest progress in the field, including new statistical and computational methodology, new data resources in genetics and genomics, and new bioengineering technologies.

Organizing Faculty: Wenpin HouIuliana Ionita-Laza, Zhonghua Liu