Departmental Seminars & Lectures

Lectures are in-person only unless marked otherwise.
For all Zoom inquiries to attend seminars with a virtual option, pl
ease send an email to Erin Elliott, Programs Coordinator (ee2548@cumc.columbia.edu).

During the Fall and Spring semesters, the Department of Biostatistics holds regular seminars on Thursdays, called the Levin Lecture Series, on a wide variety of topics which are of interest to both students and faculty. Over each semester, there are also often guest lectures outside the regular Thursday Levin Lecture Series, to provide a robust schedule the covers the wide range of topics in Biostatistics. The speakers are invited guests who spend the day of their seminar discussing their research with Biostatistics faculty and students. 

Spring 2025 Schedule

Thursday, January 16th, Hess Commons, 11:45am
Levin Lecture 

Charles J. Wolock, PhD
Postdoctoral Researcher, Department of Biostatistics, Epidemiology, and Informatics
University of Pennsylvania

Nonparametric approaches to assessing variable importance using health data

Abstract: 

Given a collection of features available for inclusion in a predictive model, it may be of interest to quantify the relative importance of a subset of features for the prediction task at hand. This objective has given rise to a substantial literature focused on defining, estimating, and making inference on variable importance. Within this field, there is a need for tools to handle the complications characteristic of health data, including coarsening of the outcome variable. Drawing upon examples from research in infectious diseases and mental health, we present novel methods for assessing variable importance in a nonparametric, algorithm-agnostic manner. Our proposed methods allow for flexible estimation of nuisance parameters and provide asymptotically valid inference, while also enjoying robustness properties and nonparametric efficiency. We demonstrate the performance of our proposed procedures via numerical simulations and present an analysis of data from the HVTN 702 HIV vaccine trial, with the aim of informing enrollment strategies for future trials. Furthermore, we discuss several open questions surrounding variable importance and outline possible avenues of future work.

Thursday, January 23rd, Hess Commons, 11:45am 
Levin Lecture 

Harsh Parikh, PhD
Postdoctoral Fellow, Department of Biostatistics
John Hopkins University

Interpretable Causal Inference for Advancing Healthcare and Public Health

Abstract: 

Causal inference methods are essential across healthcare, public health, and social sciences, helping understand complex systems and inform decision-making. While integrating machine learning (ML) and statistical techniques has improved causal estimation, many of these methods depend on black-box ML approaches. This raises concerns about the communicability, auditability, and trustworthiness of causal estimates, especially in high-stakes contexts. My research addresses these challenges by developing interpretable causal inference methods. In this presentation, I introduce an approach for bridging the research-to-practice gap by generalizing randomized controlled trial (RCT) findings to target populations. Although RCTs are fundamental for understanding causal effects, extending their findings to broader populations is difficult due to effect heterogeneity and the underrepresentation of certain subgroups. Our work tackles this issue by identifying and interpretably characterizing underrepresented subgroups in RCTs. Specifically, we propose the Rashomon Set of Optimal Trees (ROOT), an optimization-based method that produces interpretable characteristics of underrepresented subgroups. This approach helps researchers communicate findings more effectively. We apply ROOT to extend inferences from the Starting Treatment with Agonist Replacement Therapies (START) trial -- assessing the effectiveness of opioid use disorder medication -- to the real-world population represented by the Treatment Episode Dataset: Admissions (TEDS-A). By refining target populations using ROOT, our framework offers a systematic approach to enhance decision-making accuracy and inform future trials in diverse populations.

Friday, January 24th, Hess Commons, 11:45am 
Guest Lecture

Eric Sun
PhD Student
Stanford University

Machine learning for aging and spatial omics

Abstract: 

Aging is a highly complex process and the greatest risk factor for many chronic diseases including cardiovascular disease, dementia, stroke, diabetes, and cancer. Recent spatial and single-cell omics technologies have enabled the high-dimensional profiling of complex biology including that underlying aging. As such, new machine learning and computational methods are needed to unlock important insights from spatial and single-cell omics datasets. First, I present the development of high-resolution machine learning models (‘spatial aging clocks’) that can measure the aging of individual cells in the brain. Using these spatial aging clocks, I discovered that some cell types can dramatically influence the aging of nearby cells. Next, I present new computational and statistical methods for overcoming the gene coverage limitations of existing spatial omics technologies, which have enabled the discovery of gene pathways underlying the spatial effects of brain aging. Finally, I introduce several methods for improving the reliability and robustness of high-dimensional data visualizations.

Tuesday, January 28th, 8th Floor Auditorium, 11:45am
Guest Lecture 

Yao Zhang, PhD
Postdoctoral Scholar, Statistics
Stanford University

Posterior Conformal Prediction

Abstract: 

Conformal prediction is a popular technique for constructing prediction intervals with distribution-free coverage guarantees. The coverage is marginal, holding on average over the entire population but not necessarily for any specific subgroup. In this talk, I will introduce a new method, posterior conformal prediction (PCP), which generates prediction intervals with both marginal and approximate conditional coverage for clusters (or subgroups) naturally discovered in the data. PCP achieves these guarantees by modelling the conditional conformity score distribution as a mixture of cluster distributions. Compared to other methods with approximate conditional coverage, this approach produces tighter intervals, particularly when the test data is drawn from clusters that are well represented in the validation data. PCP can also be applied to guarantee conditional coverage on user-specified subgroups, in which case it achieves robust coverage on smaller subgroups within the specified subgroups. In classification, the theory underlying PCP allows for adjusting the coverage level based on the classifier’s confidence, achieving significantly smaller sets than standard conformal prediction sets. Experiments demonstrate the performance of PCP on diverse datasets from socio-economic, scientific and healthcare applications.

CANCELLED: Thursday, January 30th, Hess Commons, 11:45am
Levin Lecture 

Paul Albert, PhD
Senior Investigator, Biostatistics Branch
NCI/DCEG

Innovative applications of hidden Markov models in cancer Epidemiology and Genetics

Abstract: 

During the past 30 years, Hidden Markov modeling (HMM) has had a big impact in the analysis of biomedical data, with a few important application areas in genomics, natural history modeling, environmental monitoring, and the analysis of longitudinal data. In cancer genomics, for example, the use of HMM has played an important role in uncovering both susceptibility (germline) and tumor progression (somatic) of cancer. In this talk, I will present a series of novel applications of HMMs in cancer epidemiology and genetics. I will describe the use of HMM to identify multiple subclones in next-generation sequences of tumor samples (Choo-Wosoba et al., Biostatistics 2021). I will also discuss the application of HMMs for characterizing the natural history of natural history of human papillomavirus and cervical precancer (Aron et al., Statistics in Medicine, 2021). Further, I describe the use of HMMs for application of HMMs for investigating the effects of sleeping and activity on mortality. Last, I describe the use of HMMs for joinpoint analysis in cancer surveillance. All four examples required interesting adaptations of standard HMM estimation that will be highlighted.

Tuesday, February 4th, Hess Commons, 11:45am
Guest Lecture 

Ying Cui, PhD
Postdoctoral Scholar, Department of Biomedical Data Science
Stanford University

Advancing Biomedical Data Science: From Population Insights to Personalized Decisions

Abstract: 

Rapid advances in biomedicine have enabled us to address important questions that were once intractable. There is a pressing need for analyzing massive data sets emerging from cutting-edge technologies, presenting challenges such as high-dimensionality and multi-modality. Additionally, there has been rising interests in personalized decision-making. Inspired by these challenges, my research aims to enhance the integration of statistical insights and data science innovations in biomedical research. In this talk, I will cover two projects.

The first part of the talk explores key questions about identifying covariates relevant to clinical outcomes of interest. Addressing these questions, however, can be complicated due to the presence of complex covariate effects. To tackle this problem, I developed a new testing and screening framework by adopting a global view via the novel concept of interval quantile independence. I showed that this general testing framework can naturally yield both unconditional and conditional screening procedures for ultra-high dimensional settings and enjoy the sure screening property.

In the second part of the talk, I address the feature selection problem from a personalized perspective. I designed a novel dynamic prediction rule to determine the optimal order of acquiring features in predicting clinical outcomes of interest for individual subject. The goal is to optimize model performance while reducing the costs associated with measuring features. To achieve this, I employed reinforcement learning, where the agent decides the best action at each step: either making a final decision or continuing to collect new predictors. The proposed approach mirrors and improves real life decision-making processes, employing a “learn-as-you-go” paradigm.

Thursday, February 6th, Hess Commons, 11:45am 
Levin Lecture

Tianyu Zhang, PhD
Postdoctoral Researcher, Department of Statistics & Data Science
Carnegie Mellon University

Adaptive and Scalable Nonparametric Estimation via Stochastic Optimization

Abstract: 

Nonparametric procedures are frequently employed in predictive and inferential modeling to relate random variables without imposing specific parametric forms. In supervised learning, for instance, our focus is often on the conditional mean function that links predictive covariates to a numerical outcome of interest. While many existing statistical learning methods achieve this with optimal statistical performance, their computational expenses often do not scale favorably with increasing sample sizes. This challenge is exacerbated in certain “online settings,” where data is continuously collected and estimates require frequent updates.

Thursday, February 27th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture 

Shan Yu, PhD
Assistant Professor, Department of Statistics
University of Virginia

Talk Title & Abstract TBA

 

 

 

 

 

Monday, March 3rd, Hess Commons, 11:45am 
Guest Lecture

Lorin Crawford, PhD
Principal Researcher at Microsoft Research
Distinguished Senior Fellow in Biostatistics, Brown University

Talk Title & Abstract TBA

 

 

 

Thursday, March 13th, ARB 8th Floor Auditorium, 11:45am
Levin Lecture 

Fan Li, PhD
Associate Professor of Biostatistics
Yale University

Talk Title & Abstract TBA

 

 

 

 

 

Thursday, March 27th, Hess Commons, 11:45am
Levin Lecture 

James Zou, PhD
Associate Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering
Stanford University

Talk Title & Abstract TBA

 

 

 

 

Thursday, April 3th, Hess Commons, 11:45am 
Levin Lecture

Jennifer Hill, PhD
Professor of Applied Statistics; Co-Department Chair; Co-Director of PRIISM, Department of Applied Statistics, Social Science, and Humanities
NYU Steinhardt

Talk Title & Abstract TBA

 

 

 

Thursday, April 10th, ARB 8th Floor Auditorium, 11:45am 
Levin Lecture

Weining Shen, PhD
Associate Professor of Statistics
University of California, Irvine

Talk Title & Abstract TBA

 

 

 

 

Thursday, April 17th, Hess Commons, 11:45am 
Levin Lecture

Ali Shojaie, PhD
Professor of Biostatistics & Statistics, Associate Chair of Biostatistics
University of Washington

Talk Title & Abstract TBA

 

 

 

 

 

Thursday, April 24th, Hess Commons, 11:45am
Levin Lecture 

Amita Manatunga, PhD
Donna J. Brogan Professor in Biostatistics
Rollins School of Public Health, Emory University

Talk Title & Abstract TBA

 

 

 

 

Thursday, May 1st, Hess Commons, 11:45am 
Levin Lecture

Qing Pan, PhD
Professor, Department of Biostatistics and Bioinformatics
George Washington University, Milken Institute School of Public Health

Talk Title & Abstract TBA