How Big Data Helps Us See
Human perception is notoriously feeble. We can see two dimensions, like a painting, fairly well; and with effort we can see three dimensions—the way we perceive ourselves. Beyond that, most of us are at a loss. Which is why statistical research of “big data,” like Snigdhansu Chatterjee studies, is so useful.
Complex systems, like climate or our brains, generate huge amounts of data with trillions of dimensions. “We cannot see them but we can design an algorithm to see patterns in them,” says Chatterjee. As director of the U’s Institute for Research on Statistics and its Applications, he and his researchers have been looking for patterns within seemingly impenetrable problems. How do neurodegenerative diseases like Alzheimer’s affect the brain? What can we learn by comparing climate models? Can we anticipate and de-escalate political violence?
Chatterjee grew up in eastern India and entered statistics essentially by accident—he won a fellowship to the Indian Statistical Institute. It offered about three dollars a month, at a time when Indian colleges cost about 20 cents a month to attend, so he followed the money. He eventually taught at the University of Manchester before coming to the U in 2002. Chatterjee had multiple offers then, but during his only visit to Minnesota he crossed the pedestrian bridge from the East Bank to the West Bank and noticed a quote emblazoned on the bridge by a student organization. Something about standing beside the river in contemplation, from Hermann Hesse’s novel Siddhartha. “Everything being equal,” he says, “I liked the bridge.”
“Unless I see a data set, I don’t believe it.” —Snigdhansu Chatterjee
A couple of years later, an old friend from India who studies climate and water systems reached out seeking Chatterjee’s insight. “The first thing I told him was, unless I see a data set, I don’t believe it,” he says. It was his entrée into studying climate change. His team has since developed a methodology for comparing climate models developed around the world to understand what patterns emerge, what some models can predict, and the nature of extreme events like severe drought or severe rainfall.
He’s taken a similar approach to learning whether extremes may also factor in neurodegenerative disease. With the data from brain images—the peaks and valleys of neural activity—his team can create algorithms to see what is otherwise difficult to observe: the genesis of disease in the brain, starting perhaps with extreme changes in the firing of neurons.
Statistical analysis has become a hot subject. The internet is allowing more and more information to be archived and converted to data—the raw material of statistics. “In the last five to 10 years, there has been a growing consciousness that almost anything is data,” Chatterjee says. “Your fridge can generate data, your commute can generate data, your Twitter handle can generate data.” There has never been so much for statisticians to work with, and so much interest in their work.
“It’s pretty much the only way you can get any other science to operate,” he says. “You can investigate a theory, but how do you know if you’re getting closer to truth? You have to put that theory together with the data, and that involves statistics. Unless you do that, it’s just pure imagination.”