A Geometric Approach to Learning Mixture Models

Mar 5

Thursday, March 5, 2015

3:30 pm - 4:30 pm
Gross Hall 330


Venkatesh Saligrama, Boston University

In a wide spectrum of problems in science and engineering that includes hyperspectral imaging, gene expression analysis, and metabolic networks, the observed data is high-dimensional and can be modeled as arising from an unknown mixture of a small set of unknown shared latent factors. Our approach is based on a natural separability property of the shared latent factors. Our separability property posits that every latent factor contains at least one component that is dominant in that factor. We first establish that this property is not only natural but an inevitable consequence of high-dimensionality, and satisfied by the estimates produced by popular nonparametric Bayes approaches. We show that geometrically these dominant latent factors can be associated with extreme points in a suitable space. We leverage this geometric insight to develop a suite of efficient algorithms for a diverse set of latent variable problems. The proposed random-projections-based algorithm is naturally amenable to a low communication-cost distributed implementation that is attractive for modern web-scale distributed data mining applications. We then establish statistical and computational efficiency guarantees for learning in high-dimensional latent variable models. This is joint work with Weicong Ding & Prakash Ishwar.