Masterclass by Professor David B. Dunson
This event is courtesy of The Carnegie Trust and organised in conjunction with the EPSRC CDT in Data Science at the University of Edinburgh.
Dates & times
- 6th June 2018: 2-5.30pm
- 7th June 2018: 9.30am-1pm
Application for registration deadline
You must apply by 13th April 2018.
Professor David B. Dunson
Duke University & Carnegie Centenary Professor
David Dunson is Arts and Sciences Distinguished Professor of Statistical Science, Mathematics, and Electrical & Computer Engineering at Duke University.
His research focuses on Bayesian statistical theory and methods motivated by high-dimensional and complex applications. A particular emphasis is on dimensionality reduction, scalable inference algorithms, latent factor models, and nonparametric approaches, particularly for high-dimensional, dynamic and multimodal data, including images, functions, shapes and other complex objects.
His work involves inter-disciplinary thinking at the intersection of statistics, mathematics and computer science. Motivation comes from applications in epidemiology, environmental health, neurosciences, genetics, fertility and other settings (music, fine arts, humanities).
Dr. Dunson is a fellow of the American Statistical Association and of the Institute of Mathematical Statistics. He is winner of the 2007 Mortimer Spiegelman Award for the top public health statistician under 41, the 2010 Myrto Lefkopoulou Distinguished Lectureship at Harvard University, the 2010 COPSS Presidents' Award for the top statistician under 41, and the 2012 Youden Award for interlaboratory testing methods.
Scalable Bayesian Inference
The Bayesian paradigm provides a natural framework for characterizing uncertainty in large and complex datasets via a probabilistic modeling framework. In many applications, particularly in the sciences, accurate uncertainty quantification (UQ) is of critical importance. In these settings, it is not simply of interest to do black-box prediction, but one is primarily focused on learning about some scientific phenomena based on a series of datasets. In modern applications, these datasets are increasingly complex and multimodal - coming from different sources and having different scales. Machine learning and statistical methods for large and complex data tend to either ignore UQ entirely, in focusing on optimization and providing a point estimate, or lack a framework for reliable inferences regarding scientific questions of interest.
In this masterclass, I will provide a brief introduction and motivation for the Bayesian paradigm, and then will focus on the practical problem of how to scale up Bayesian inferences, while maintaining accuracy guarantees. Due to the lack of theoretical guarantees for analytic approximations, such as variational Bayes, I focus primarily on the problem of scaling up sampling algorithms, such as Markov chain Monte Carlo (MCMC), while arguing that MCMC, when well-designed, is not necessarily of higher computational complexity than optimization algorithms.
After providing a brief introduction to sampling-based posterior inferences, I will focus on simple and scalable algorithms for dealing with extremely large datasets via two different types of approaches:
(i) embarrassingly parallel (EP)-MCMC
(ii) approximate MCMC (aMCMC).
Both of these have theoretical guarantees but I will focus on practical issues in this course instead of going into fine theoretical details.
After focusing on large sample size (big n) problems, and providing examples, I will transition to discussing scalable Bayesian methods for very high-dimensional data analysis (big p) including not only well-studied cases such as large p regression and classification but also less considered cases of multivariate data analysis for discrete data and large networks.
The goal of the class is to provide an introduction to this class of methods, so that the students are ready to start exploring these approaches in a variety of problems, while understanding some of their advantages and disadvantages. Examples will be provided to problems in neuroscience, ecology and genomics, among others.
Masterclass by Professor David B. Dunson
School of Informatics, University of Edinburgh