Special Data Science Seminar
Thursday, February 5, 2015
3:00 pm - 4:00 pm
Gross Hall 330
Jason Lee, ICME, Stanford University
Selective Inference is the problem of testing hypotheses that are chosen or suggested by the data. Inference after variable selection in high-dimensional linear regression is a common example of selective inference; we only estimate and perform inference for the selected variables. We propose the Condition on Selection framework, which is a framework for selective inference that allows selecting and testing hypotheses on the same dataset. In the case of inference after variable selection (variable selection by lasso, marginal screening, or forward stepwise), the Condition on Selection framework allows us to construct confidence intervals for regression coefficients, and perform goodness-of-fit testing for the selected model. In the second part of the talk, we consider the problem of sparse regression in the distributed setting. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Biography: Jason Lee is a fifth year PhD student in Stanford University advised by Trevor Hastie and Jonathan Taylor. His research interests are in high-dimensional statistics, selective inference, optimization, and machine learning.