Electrical & Computer Engineering Seminar
Tuesday, April 18, 2017
12:00 pm - 1:00 pm
Gross Hall, 330 -- Ahmadieh Family Grand Hall
Asymptotic Performance of PCA for High Dimensional Heteroscedastic DataAbstract: Principal Component Analysis (PCA) is a classical method for reducing the dimensionality of data by projecting them onto a subspace that captures most of their variation. Effective use of PCA in modern applications requires understanding its performance for data that are both high-dimensional (i.e., with dimension comparable to or larger than the number of samples) and heteroscedastic (i.e., with noise that has non-uniform variance across samples such as outliers). This paper analyzes the statistical performance of PCA in this setting, that is, for high-dimensional data drawn from a low-dimensional subspace and degraded by heteroscedastic noise. We provide simple expressions for the asymptotic PCA recovery of the underlying subspace, subspace amplitudes and subspace coefficients; the expressions enable both easy and efficient calculation and reasoning about the performance of PCA. We exploit the structure of these expressions to show that asymptotic recovery for a fixed average noise variance is maximized when the noise variances are equal (i.e., when the noise is in fact homoscedastic). Hence, while average noise variance is often a practically convenient measure for the overall quality of data, it gives an overly optimistic estimate of the performance of PCA for heteroscedastic data.