|You are kindly invited to:|
|Vincent Meyer Colloquium 3|
Divide-and-Conquer and Statistical Inference for Massive Data
Wednesday, Dec 5, 2012, Meyer Bldg. Auditorium 280
Refreshments at 12:30, the lecture will start at 12:45
Lecturer: Professor Michael I. Jordan
Bio: Michael I. Jordan is the Pehong Chen Distinguished Professor in the Departments of EECS and Statistics at the University of California, Berkeley. He is a leading researcher in machine learning and artificial intelligence, and has been a prime mover in machine learning and in establishing links with statistics. In recent years he has focused on Bayesian nonparametric analysis, probabilistic graphical models, spectral methods, variational methods, kernel machines and applications to problems in statistical genetics, signal processing, computational biology, information retrieval and natural language processing. Prof. Jordan is a member of the National Academy of Sciences, the National Academy of Engineering and American Academy of Arts and Sciences. He was an IMS Neyman Lecturer and ACM/AAAI Allen Newell Awardee
Abstract: I present some recent work on statistical inference in the massive data setting. Divideand-conquer is a natural computational paradigm for approaching massive data problems, particularly given recent developments in distributed and parallel computing, but some interesting challenges arise when applying divide-and-conquer algorithms to statistical inference problems. One interesting issue is that of obtaining confidence intervals in massive datasets. The bootstrap principle suggests resampling data to obtain fluctuations in the values of estimators, and thereby confidence intervals, but this is infeasible with massive data. Subsampling the data yields fluctuations on the wrong scale, which have to be corrected to provide calibrated statistical inferences.
I present a new procedure, the “bag of little bootstraps,’’ which circumvents this problem, inheriting the favorable theoretical properties of the bootstrap but also having a much more favorable computational profile. Another issue that I discuss is the problem of large-scale matrix completion. Here divide-and-conquer is a natural heuristic that works well in practice, but new theoretical problems arise when attempting to characterize the statistical performance of divide-and-conquer algorithms.
Here the theoretical support is provided by concentration theorems for random matrices, and I present a new approach to this problem based on Stein’s method.