Statistics Seminar: Hongzhe Li
140 Nolte Center
Title: Methods for High Dimensional Compositional Data Analysis in Microbiome Studies
Human microbiome studies using high throughput DNA sequencing generate compositional data with the absolute abundances of microbes not recoverable from sequence data alone. In compositional data analysis, each sample consists of proportions of various organisms with a unit sum constraint. This simple feature can lead traditional statistical methods when naively applied to produce errant results and spurious associations. In addition, microbiome sequence data sets are typically high dimensional, with the number of taxa much greater than the number of samples. These important features require further development of methods for analysis of high dimensional compositional data. This talk presents several latest developments in this area, including methods for estimating the compositions based on sparse count data, two-sample test for compositional vectors and regression analysis with compositional covariates. Several microbiome studies at the University of Pennsylvania are used to illustrate these methods and several open questions will be discussed.