Faculty Spotlight: Ian McKeague
Part of my research in recent years has involved data collected from wearable devices that are useful for the physiological monitoring of subjects. Such devices are capable of generating massive amounts of data longitudinally, which raises many challenging statistical issues. With my former PhD student Hsin-wen Chang (now at Academia Sinica in Taiwan), I have been developing a technique based on so-called occupation-time curves, which describe the amount of time that a reading on the device (measuring the activity of a subject in some sense) exceeds a given level, as that level varies. We treat such curves as functional data objects and develop formal statistical tests for whether they differ between groups of subjects (e.g., treated versus control).
The application of this approach in practice raises many thorny issues, however, not the least of which is reconciling the analytical strategy with the practical aspects of collecting the data. In a Columbia-based study involving accelerometer data, Jeff Goldsmith, Associate Professor of Biostatistics, and I are working with Biostatistics graduate student Chenxi Liu on data collected in the study of experimental drugs developed for the treatment of a mitochondrial DNA depletion syndrome (a devastating childhood disease). Our part in the project started when Seamus Thompson put Jeff and me in touch with a group led by Michio Hirano in Neurology. Seamus has collaborated with Michio’s team for many years, and they are now gearing up to launch a phase 3 clinical trial to assess the efficacy of a pharmacological therapy that Michio has developed. One of the biostatistical challenges is to understand various anomalous features of the observed accelerometer data, and to make sure the corresponding occupation-time curves are consistent enough to be comparable between subjects. The accelerometer data are high resolution, and in their uncompressed form can take a gigabyte of storage per subject. Early on, we were unsure as to whether a compressed version of the data would suffice, but it turned out that the uncompressed data are essential to obtaining accurate occupation-time curves. Another issue has been the coding of the accelerometer data. The devices we use to collect the data from children in the study come in various models. Recently, this created an issue when we upgraded to a model with more storage and longer battery life. Unbeknownst to us, the coding of the data (acceleration in x-y-z directions) then changed from 8-bit to 10-bit, and the range of measurements doubled. Fortunately, this was easy to reprogram once we knew about it!
Ian McKeague, PhD
Professor of Biostatistics