2024 Biostatistics Epidemiology Summer Training (BEST) Diversity Program

Each summer, a highly selective group of undergraduates from across the country attend classes in introductory biostatistics and statistical computing, and are engaged in research under the supervision of a faculty member.  Now in its 16th year, the 2024 Biostatistics Epidemiology Summer Training (BEST) Diversity Program was a great success. Fourteen students from around the country came to stay at Columbia, taking classes and conducting research alongside our faculty. Along the way, we had great times around the city from the sounds of Broadway to home plate at a Mets game. We were sad to see such an excellent group of students head off, but look forward to seeing the great things our BEST alumni will do!

Schools Represented:

Full List of Schools:

  • John Hopkins University
  • Columbia University
  • Nova Southeastern University Florida
  • University of Virginia
  • Bates College
  • Arizona State University
  • University of Massachusetts, Amherst
  • Winston Salem State University
  • Boston College
  • Brown University
  • Brandeis University
  • University of Minnesota, Minneapolis

Research Projects:

Causal discovery of signaling networks: learning causal graphical models from genetic data to understand pathways of disease

Mentor: Daniel Malinsky, PhD, Assistant Professor of Biostatistics
Mentees: Avani Ghosh; Radiya Imran; James Hiller

“Causal inference” aims to disentangle cause-and-effect relationships from mere statistical correlations, which is important for understanding mechanisms and the consequences of interventions on complex systems. “Causal discovery” is about estimating causal structures (networks or graphs) from purely observational data. We will apply graph-learning algorithms to data on gene expression/protein concentration from human cells and try to explore how different statistical choices embedded in those algorithms have consequences for the sparsity of the learned graph.

Characterizing Determinants of Cardiovascular Health

Mentor: Yihong Zhao, PhD, Professor of Data Science, School of Nursing
Mentees: Tatyana Bowers; Joniel Lewis; Harris Shaikh

Cardiovascular diseases remain the leading cause of mortality globally, creating an urgent need for improved preventive measures and treatments. Understanding the complex interplay of various determinants that influence cardiovascular health is crucial for developing targeted intervention strategies that can be personalized based on individual risk profiles. In this project, we have two aims. First, we will develop a metric that can be used to evaluate cardiovascular health. Second, we will use state-of-art machine learning methods to evaluate the relative impact of various determinants such as lifestyle choices, environmental factors, and socio-economic status on cardiovascular health.

Sleep after Stroke: Examining the relationship of sleep apnea with post-stroke outcomes

Mentor: Ari Shechter, PhD, Associate Professor of Medical Sciences (in Medicine) at CUMC
Mentees: Zina Mojekwu; Kayla Williamson

Sleep is closely related to nearly all aspects of physical and mental health and disease. The main purpose of the study was to examine how sleep in individuals who experienced stroke relates to risk of recurrent cardiovascular event and psychological health over the year following stroke. This study was a longitudinal cohort study with measures taken at baseline (i.e., at stroke hospitalization), and 1-, 6-, and 12-month follow-ups. The current analysis will focus on sleep apnea (a disorder where people stop breathing repeatedly during sleep) – to examine its prevalence, and its relationship to post-stroke outcomes like fatigue, depression, anxiety, level of physical disability, rehospitalizations, and recurrent events.

Assessing AI Tools in Academic Research Writing

Mentor: Tian Gu, PhD, Assistant Professor of Biostatistics
Mentees: Ariana Yahira; Adheesh Perera

Large Language Models have facilitated the widespread adoption of AI tools such as ChatGPT that serve various roles in academic research, from basic coding to language polishing. Increasing numbers of AI-powered tools are available in the market that promise to make essay writing easier and faster. These tools are often advertised on social media platforms, with names like JenniAI, Aithor, and Scite. However, is it really true that one can effortlessly use an AI assistant to write an essay? Can they really provide accurate and comprehensive research citations as they claim? In what areas and to what extent can they be helpful in writing an academic research paper? This project aims to systematically assess the effectiveness of AI tools in academic research writing, specifically in the context of literature review. We will evaluate their accuracy, efficiency, and impact on the quality of writing a statistical paper with an applied or a methodological focus, respectively.

Cortical myelin profile variations in lifetime in NKI data

Mentor: Seonjoo Lee, PhD, Associate Professor of Clinical Biostatistics (in Psychiatry)
Mentees: Divinia Ashley; Hanna Duque

Demyelination is observed in development, healthy aging, and age-related neurodegenerative disorders. It is important to identify the lifetime patterns of intracortical myelin. We will analyze n=2000 participants aged 6-85 from the NKI-RS dataset. We will investigate intracortical myelin using the T1w/T2w ratio and linear and nonlinear regression analyses.

Chitwan Valley Family Study – data archiving and hair cortisol

Mentor: Sabrina Hermosilla, PhD, Assistant Professor of Population and Family Health
Mentees: Anagha Chundury; Naomi Saenger

This project builds on the 27+ year CVFS (https://cvfs.isr.umich.edu/). The students will engage in analytic data management and explore individual identifiers across multiple datasets and work with data management team to create a unique identifier solution and implement solution across datasets. Additionally, they will be engaged in a project that will conduct analyses in Stata on respondents who completed hair cortisol survey and those who did not with a goal of understanding what characteristics predict study protocol compliance and how best to design data collection tools to maximize compliance.

The BEST Diversity Program is funded by NHLBI grant 5R25HL096260-15.