Statistics Seminar: Antonio Linero
108 Folwell Hall
Title: Theory and Practice for Bayesian Regression Tree Ensembles
Ensembles of decision trees have become a standard component of the data analyst's toolkit; commonly used algorithms include random forests and boosted decision trees. In this talk, we investigate the properties of regression tree ensembles from a Bayesian standpoint. We focus on the interplay between theory and practice to study the properties of ensembles and obtain insights into (a) why decision tree ensembles are successful in practice and (b) where they might be improved. We provide validation for the long-held hypothesis that BART ensembles perform well due to their ability to detect low-order interactions, a property which describes many real-world settings. Further, we identify two areas in which BART ensembles can be expected to be suboptimal: under sparsity, and when the underlying regression function exhibits higher-order smoothness. We give theoretical support for these insights by establishing posterior contraction at near-optimal rates adaptively across a large family of function spaces, and provide empirical support by applying our methodology to enchmark datasets. We conclude by presenting extensions of our methodology which account for other interesting structures beyond sparsity and smoothness, and discuss how the insights we obtain can be extended to non-Bayesian decision tree ensembling methods.