Molei Liu, PhD

Assistant Professor of Biostatistics

ml4890@cumc.columbia.edu

Overview

Molei Liu is an Assistant Professor of Biostatistics in Columbia Mailman School of Public Health, with a broad research spectrum from methodological and theoretical analysis of general statistic problems to real-world evidence-based biomedical studies. He has successfully built up methodological research in several active fields in statistics and machine learning, including federated learning, high dimensional inference, semi-supervised learning and transfer learning, with application to solve the real problems in electronic health records (EHR) and their linked bio-repositories.

Academic Appointments

Assistant Professor of Biostatistics

Credentials & Experience

Education & Training

BS, 2017 Statistics, Peking University
PhD, 2022 Biostatistics, Harvard University

Research

Research Interests

Biomedical studies
Electronic Health Records (EHR/EMR)
Federated learning
High-dimensional inference
Semi-supervised learning
Transfer learning

Selected Publications

1. Liu, M., Zhang, Y., Liao, K., and Cai, T., 2023. Augmented Transfer Regression Learning with Semi-non-parametric Nuisance Models. Journal of Machine Learning Research (forthcoming).

2. Li, S., and Liu, M. (#), 2023. Maxway CRT: Improving the Robustness of Model-X Inference. Journal of the Royal Statistical Society Series B (Statistical Methodology) (forthcoming).

3. Guo, X., Wei, W., Liu, M., Cai, T., Wu, C. and Wang, J., 2023. Assessing Heterogeneous Risk of Type II Diabetes Associated with Statin Usage: Evidence from Electronic Health Record Data. Journal of the American Statistical Association (forthcoming)

4. Cai, T., Liu, M. (*) and Xia, Y., 2022. Individual data protected integrative regression analysis of high-dimensional heterogeneous data. Journal of the American Statistical Association, 117(540), pp.2105-2119.

5. Gronsbell, J., Liu, M. (#), Tian, L. and Cai, T., 2022. Efficient evaluation of prediction rules in semi-supervised settings under stratified sampling. Journal of the Royal Statistical Society Series B: Statistical Methodology, 84(4), pp.1353-1391.

6. Liu, M., Katsevich, E., Janson, L. and Ramdas, A., 2022. Fast and powerful conditional randomization testing via distillation. Biometrika, 109(2), pp.277-293.

7. Zhang, Y., Liu, M. (#), Neykov, M. and Cai, T., 2022. Prior adaptive semi-supervised learning with application to EHR phenotyping. The Journal of Machine Learning Research, 23(1), pp.3617-3641.

8. Liu, M., Xia, Y., Cho, K. and Cai, T., 2021. Integrative high dimensional multiple testing with heterogeneity under data sharing constraints. The Journal of Machine Learning Research, 22(1), pp.5607-5632.

9. Hong, C., Rush, E., Liu, M., Zhou, D., Sun, J., Sonabend, A., Castro, V.M., Schubert, P., Panickan, V.A., Cai, T. and Costa, L., 2021. Clinical knowledge extraction via sparse embedding regression (KESER) with multi-center large scale electronic health record data. NPJ digital medicine, 4(1), p.151.

10. Geva, A., Liu, M., Panickan, V.A., Avillach, P., Cai, T. and Mandl, K.D., 2021. A high-throughput phenotyping algorithm is portable from adult to pediatric populations. Journal of the American Medical Informatics Association, 28(6), pp.1265-1269.

# co-first author, * alphabetic order