Departmental Seminars & Lectures
Lectures are in-person only unless marked otherwise.
For all Zoom inquiries to attend seminars with a virtual option, please send an email to Erin Elliott, Programs Coordinator (ee2548@cumc.columbia.edu).
During the Fall and Spring semesters, the Department of Biostatistics holds regular seminars on Thursdays, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. Over each semester, there are also often guest lectures outside the regular Thursday Levin Lecture Series, to provide a robust schedule the covers the wide range of topics in Biostatistics. The speakers are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students.
Spring 2025 Schedule
Thursday, January 16th, Hess Commons, 11:45am
Levin Lecture
Charles J. Wolock, PhD
Postdoctoral Researcher, Department of Biostatistics, Epidemiology, and Informatics
University of Pennsylvania
Nonparametric approaches to assessing variable importance using health data
Abstract:
Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. This objective has given rise to a substantial literature focused on defining, estimating, and making inference on variable importance. Within this field, there is a need for tools to handle the complications characteristic of health data, including coarsening of the outcome variable. Drawing upon examples from research in infectious diseases and mental health, we present novel methods for assessing variable importance in a nonparametric, algorithm-agnostic manner. Our proposed methods allow for flexible estimation of nuisance parameters and provide asymptotically valid inference, while also enjoying robustness properties and nonparametric efficiency. We demonstrate the performance of our proposed procedures via numerical simulations and present an analysis of data from the HVTN 702 HIV vaccine trial, with the aim of informing enrollment strategies for future trials. Furthermore, we discuss several open questions surrounding variable importance and outline possible avenues of future work.
Thursday, January 23rd, Hess Commons, 11:45am
Levin Lecture
Harsh Parikh, PhD
Postdoctoral Fellow, Department of Biostatistics
John Hopkins University
Interpretable Causal Inference for Advancing Healthcare and Public Health
Abstract:
Causal inference methods are essential across healthcare, public health, and social sciences, helping understand complex systems and inform decision-making. While integrating machine learning (ML) and statistical techniques has improved causal estimation, many of these methods depend on black-box ML approaches. This raises concerns about the communicability, auditability, and trustworthiness of causal estimates, especially in high-stakes contexts. My research addresses these challenges by developing interpretable causal inference methods. In this presentation, I introduce an approach for bridging the research-to-practice gap by generalizing randomized controlled trial (RCT) findings to target populations. Although RCTs are fundamental for understanding causal effects, extending their findings to broader populations is difficult due to effect heterogeneity and the underrepresentation of certain subgroups. Our work tackles this issue by identifying and interpretably characterizing underrepresented subgroups in RCTs. Specifically, we propose the Rashomon Set of Optimal Trees (ROOT), an optimization-based method that produces interpretable characteristics of underrepresented subgroups. This approach helps researchers communicate findings more effectively. We apply ROOT to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- assessing the effectiveness of opioid use disorder medication -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.
Friday, January 24th, Hess Commons, 11:45am
Guest Lecture
Eric Sun
PhD Student
Stanford University
Machine learning for aging and spatial omics
Abstract:
Aging is a highly complex process and the greatest risk factor for many chronic diseases including cardiovascular disease, dementia, stroke, diabetes, and cancer. Recent spatial and single-cell omics technologies have enabled the high-dimensional profiling of complex biology including that underlying aging. As such, new machine learning and computational methods are needed to unlock important insights from spatial and single-cell omics datasets. First, I present the development of high-resolution machine learning models (‘spatial aging clocks’) that can measure the aging of individual cells in the brain. Using these spatial aging clocks, I discovered that some cell types can dramatically influence the aging of nearby cells. Next, I present new computational and statistical methods for overcoming the gene coverage limitations of existing spatial omics technologies, which have enabled the discovery of gene pathways underlying the spatial effects of brain aging. Finally, I introduce several methods for improving the reliability and robustness of high-dimensional data visualizations.
Tuesday, January 28th, 8th Floor Auditorium, 11:45am
Guest Lecture
Yao Zhang, PhD
Postdoctoral Scholar, Statistics
Stanford University
Posterior Conformal Prediction
Abstract:
Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, holding on average over the entire population but not necessarily for any specific subgroup. In this talk, I will introduce a new method, posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional coverage for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional conformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional coverage, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it achieves robust coverage on smaller subgroups within the specified subgroups. In classification, the theory underlying PCP allows for adjusting the coverage level based on the classifier’s confidence, achieving significantly smaller sets than standard conformal prediction sets. Experiments demonstrate the performance of PCP on diverse datasets from socio-economic, scientific and healthcare applications.
CANCELLED: Thursday, January 30th, Hess Commons, 11:45am
Levin Lecture
Paul Albert, PhD
Senior Investigator, Biostatistics Branch
NCI/DCEG
Innovative applications of hidden Markov models in cancer Epidemiology and Genetics
Abstract:
During the past 30 years, Hidden Markov modeling (HMM) has had a big impact in the analysis of biomedical data, with a few important application areas in genomics, natural history modeling, environmental monitoring, and the analysis of longitudinal data. In cancer genomics, for example, the use of HMM has played an important role in uncovering both susceptibility (germline) and tumor progression (somatic) of cancer. In this talk, I will present a series of novel applications of HMMs in cancer epidemiology and genetics. I will describe the use of HMM to identify multiple subclones in next-generation sequences of tumor samples (Choo-Wosoba et al., Biostatistics 2021). I will also discuss the application of HMMs for characterizing the natural history of natural history of human papillomavirus and cervical precancer (Aron et al., Statistics in Medicine, 2021). Further, I describe the use of HMMs for application of HMMs for investigating the effects of sleeping and activity on mortality. Last, I describe the use of HMMs for joinpoint analysis in cancer surveillance. All four examples required interesting adaptations of standard HMM estimation that will be highlighted.
Tuesday, February 4th, Hess Commons, 11:45am
Guest Lecture
Ying Cui, PhD
Postdoctoral Scholar, Department of Biomedical Data Science
Stanford University
Advancing Biomedical Data Science: From Population Insights to Personalized Decisions
Abstract:
Rapid advances in biomedicine have enabled us to address important questions that were once intractable. There is a pressing need for analyzing massive data sets emerging from cutting-edge technologies, presenting challenges such as high-dimensionality and multi-modality. Additionally, there has been rising interests in personalized decision-making. Inspired by these challenges, my research aims to enhance the integration of statistical insights and data science innovations in biomedical research. In this talk, I will cover two projects.
The first part of the talk explores key questions about identifying covariates relevant to clinical outcomes of interest. Addressing these questions, however, can be complicated due to the presence of complex covariate effects. To tackle this problem, I developed a new testing and screening framework by adopting a global view via the novel concept of interval quantile independence. I showed that this general testing framework can naturally yield both unconditional and conditional screening procedures for ultra-high dimensional settings and enjoy the sure screening property.
In the second part of the talk, I address the feature selection problem from a personalized perspective. I designed a novel dynamic prediction rule to determine the optimal order of acquiring features in predicting clinical outcomes of interest for individual subject. The goal is to optimize model performance while reducing the costs associated with measuring features. To achieve this, I employed reinforcement learning, where the agent decides the best action at each step: either making a final decision or continuing to collect new predictors. The proposed approach mirrors and improves real life decision-making processes, employing a “learn-as-you-go” paradigm.
Thursday, February 6th, Hess Commons, 11:45am
Levin Lecture
Tianyu Zhang, PhD
Postdoctoral Researcher, Department of Statistics & Data Science
Carnegie Mellon University
Adaptive and Scalable Nonparametric Estimation via Stochastic Optimization
Abstract:
Nonparametric procedures are frequently employed in predictive and inferential modeling to relate random variables without imposing specific parametric forms. In supervised learning, for instance, our focus is often on the conditional mean function that links predictive covariates to a numerical outcome of interest. While many existing statistical learning methods achieve this with optimal statistical performance, their computational expenses often do not scale favorably with increasing sample sizes. This challenge is exacerbated in certain “online settings,” where data is continuously collected and estimates require frequent updates.
Thursday, February 27th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture
Shan Yu, PhD
Assistant Professor, Department of Statistics
University of Virginia
Talk Title & Abstract TBA
Monday, March 3rd, Hess Commons, 11:45am
Guest Lecture
Lorin Crawford, PhD
Principal Researcher at Microsoft Research
Distinguished Senior Fellow in Biostatistics, Brown University
Talk Title & Abstract TBA
Thursday, March 13th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture
Fan Li, PhD
Associate Professor of Biostatistics
Yale University
Talk Title & Abstract TBA
Thursday, March 27th, Hess Commons, 11:45am
Levin Lecture
James Zou, PhD
Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering
Stanford University
Talk Title & Abstract TBA
Thursday, April 3th, Hess Commons, 11:45am
Levin Lecture
Jennifer Hill, PhD
Professor of Applied Statistics; Co-Department Chair; Co-Director of PRIISM, Department of Applied Statistics, Social Science, and Humanities
NYU Steinhardt
Talk Title & Abstract TBA
Thursday, April 10th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture
Weining Shen, PhD
Associate Professor of Statistics
University of California, Irvine
Talk Title & Abstract TBA
Thursday, April 17th, Hess Commons, 11:45am
Levin Lecture
Ali Shojaie, PhD
Professor of Biostatistics & Statistics, Associate Chair of Biostatistics
University of Washington
Talk Title & Abstract TBA
Thursday, April 24th, Hess Commons, 11:45am
Levin Lecture
Amita Manatunga, PhD
Donna J. Brogan Professor in Biostatistics
Rollins School of Public Health, Emory University
Talk Title & Abstract TBA
Thursday, May 1st, Hess Commons, 11:45am
Levin Lecture
Qing Pan, PhD
Professor, Department of Biostatistics and Bioinformatics
George Washington University, Milken Institute School of Public Health
Talk Title & Abstract TBA