“I’m Interested in How Life Works”
Assistant professor Aaron Molstad is a Minnesota native—he was raised in Stillwater and attended Minnesotan universities as an undergraduate and graduate student. This new appointment to the School of Statistics brings him full circle, as he returns to the state where he first took statistics courses and connected his newfound passion with his love of biology.
What are your areas of specialty? How did you become interested in what you study and teach?
My recent research is largely focused on developing new methods for analyzing data from different types of "omics" studies, such as genomic (the study of genes) or proteomic (the study of proteins). These types of datasets tend to be high-dimensional (i.e., one measures more characteristics on each individual than there are individuals in the study), so traditional statistical methods are often not applicable.
In order to provide scientists with theoretically sound and computationally efficient methods for data analysis, I use tools from multivariate analysis, statistical learning, and computational statistics. In general, I try to make my work as broadly applicable as possible, so even if a new method was originally motivated by an omic data analysis, it could often also be used for applications in finance, chemometrics, image recognition, and more.
I came to work in this research area somewhat circuitously. During my undergraduate studies at St. Olaf College in Northfield, MN, I took many courses in both math and biology. I love biology because—to be somewhat cosmic—I’m interested in how life works. However, during my junior year, I took my first statistics course and was immediately hooked. Statistics provided an outlet for me to use mathematical concepts to learn about the world. I graduated with a math major and pursued a PhD in statistics, thinking that my interest in biology was a thing of the past.
During graduate school at the University of Minnesota, I developed new methods for statistical learning, without emphasis on a particular area of application. After graduation, I took a postdoctoral research position at Fred Hutchinson Cancer Center in Seattle. While at Fred Hutch, I found that the types of methods I developed during my PhD were especially useful for answering important questions about the development and progression of cancer and other diseases at the molecular level. This reignited my interest in biology and started me down a path that, after some intervening years as faculty at the University of Florida, led me back to the University of Minnesota.
What courses are you currently teaching or looking forward to teaching soon? What's special about them?
In fall 2023, I will teach STAT 3301 - Regression and Statistical Computing. I'm especially excited to teach this course because it introduces foundational concepts in both statistics and R programming in ways that reinforce one another.
For example, students will program and perform simulation studies in order to better understand a theoretical concept. Often, in mathematically-oriented statistics courses, students never get the chance to work with real data, so they fail to appreciate the applicability of the concepts they've learned. This course will have students regularly analyzing real datasets—putting theory into practice. It is exactly the type of course I wish I was able to take early in my career.
What are you most excited about right now?
Recently, I've become interested in a relatively new genomic data collection technique called "single-cell sequencing". This was named Nature's Method of the Year in 2013, and a generalization of the technique received the distinction in 2019. The technology continues to improve today—becoming more informative and less expensive.
Single-cell sequencing allows us to isolate individual cells (e.g., from a tumor sample), and measure how tens of thousands of genes are expressed in each cell. With these data, we can answer cell-type specific questions that greatly enhance our understanding of a disease. For example, we may ask "In this tumor sample, is a particular gene expressed differently in T cells versus other types of cells?"
Even this seemingly straightforward question is challenging because when we collect these data, we don't know which cells are T cells and which are not. Some of my recent work addresses this exact problem. At the University of Minnesota, I look forward to working with researchers who need to use this type of data to answer their biological questions, but don't yet have the appropriate statistical methods to do so.
Are you involved with any community-engaged projects or courses? Who are you partnering with and what are you learning and doing?
Since moving to the University of Minnesota over the summer, I've become involved with the Genomic Data Commons (GDC). This project is led by Professor Saonli Basu from the Division of Biostatistics and has the potential to transform how researchers at the University of Minnesota store and analyze genomic data. GDC could streamline data processing, facilitate more reproducible research, and make a large number of datasets accessible to researchers across campus. In the past, I’ve always wished such a resource existed, so to be involved with its development is especially exciting.
This story was edited by an undergraduate student in CLA.