Seminar by Simon Mak, Duke University

Cost-Efficient Surrogate Modeling of Expensive Simulators for Scientific Discovery
Event Date & Time
| -
Event Location
325 Lind Hall

207 Church St SE
Minneapolis, MN 55455


Scientific modeling is at a defining crossroad. With breakthroughs in computational technology, complex phenomena (e.g., universe expansions, space flight) can now be reliably simulated at high fidelity. However, the generation of such data often entails large computing costs, resulting in limited data for scientific investigation. Surrogate models have emerged as a powerful tool for facilitating timely scientific progress. Such models are trained on a carefully designed set of simulation runs, and provide an efficient predictor (or “emulator”) for the costly scientific simulator. As simulators become more complex, however, training data becomes highly expensive to generate and is thus limited; in such a setting, existing surrogate models can yield poor predictions with poorly calibrated uncertainties.

We propose two novel surrogate models for tackling this critical challenge. The first model, called the Additive Multi-Index Gaussian process (AdMIn-GP), leverages a flexible additive structure on low-dimensional embeddings of the parameter space. This is guided by prior knowledge that the simulator is dominated by multiple distinct physical phenomena (i.e., multi-physics), each involving a small number of latent parameters. The AdMIn-GP models such embedded structures within a flexible Bayesian nonparametric framework, which facilitates efficient model fitting via a carefully constructed variational inference approach with inducing points. The second, called the CONglomerate multi-FIdelity Gaussian process (CONFIG) model, makes use of data simulated at multiple fidelities (or accuracies) for cost-efficient emulator training. The CONFIG embeds the multi-fidelity form of this training data within a novel non-stationary covariance function, which captures prior numerical convergence rates of the simulator. We then demonstrate the effectiveness of our models over the state-of-the-art in a suite of numerical experiments and in our motivating application on emulating the evolution of the quark-gluon plasma, which was theorized to have filled the Universe shortly after the Big Bang.


Simon Mak is an Assistant Professor in the Department of Statistical Sciences at Duke University. His research interests involve integrating domain knowledge (e.g., scientific theories, mechanistic models, guiding principles) as prior information for cost-efficient statistical inference, prediction and decision-making. Current research is motivated from ongoing multi-disciplinary collaborations in high-energy physics, physical engineering, and epidemiology, and is funded by various NSF programs and the Department of Energy. He is the recipient of the Blackwell-Rosenbluth Award from ISBA, the Statistics in Physical Engineering Sciences Award from the ASA, and numerous best paper awards from ASA, INFORMS and IISE. 

Share on: