Probability, not certainty

by Ashley Yeager

David Dunson is a statistician who helps neuroscientists and physicians to better diagnose illnesses.

“Right now, if you combine data from different brain imaging modes—fMRI, structural MRI and other variants of the scan—there are billions of data points on that one patient. Making sense of all of that is a problem,” said Dunson, a professor in statistics and electrical and computer engineering at Duke.

Working with others in the new Information Initiative at Duke and in the medical center, he is developing a project where multiple physicians diagnose a patient’s mental illness clinically and then use data from multiple imaging modes to come up with a method to predict mental health conditions computationally based on brain scan images.

This is a “big data” problem because patients don’t have just one type of information. Their record is a collection of different kinds of data—images, recordings, writing, and genetic profiles—that are all mismatched. Dunson tries to pull all of that information together, using Bayesian statistics—a method for understanding uncertainty based on mathematical probabilities—to search for patterns that could identify a patient’s condition from all of the individual’s entire medical record rather than a short doctor visit and a few diagnostic tests.

Dunson’s expertise in Bayesian statistics adds a diverse perspective to the big data group. A recent winner of the prestigious COPSS Award, given annually by the Committee of Presidents of Statistical Societies to a person under the age of 40 in recognition of outstanding contributions to the profession of statistics, Dunson says he thinks about problems “quite a bit” differently than traditional mathematicians, computer scientists and engineers.

“A distinct characteristic of a statistician who works on big data is that we care about uncertainty and doing inferences on that uncertainty,” said Dunson. He said he doesn’t want to come up with one estimate, or best guess, that describes what he calls the low-dimensional structures, or patterns in the data. “Instead, I like to come up with a probability distribution of low-dimensional structures that are consistent with the data and information I have, which is how I do analyses differently than most people in a big data project.”

--from Duke Engineering: Leading Research 2013