When is Big Data Sufficiently Big? When is it Too Big? Sample Complexity, Uniform Convergence, and Generalization Error

Submitted by admin on Wed, 05/23/2018 - 16:43

Data e Ora:

Monday, June 6, 2016 - 15:00

Luogo:

Aula Magna “A. Lepschy”

Relatore:

Prof. Eli Upfal

Descrizione:

ABSTRACT: What sample size is needed to accurately estimate a parameter? For a given parameter, the answer is straightforward. But in most advance data analysis the parameter of interest is not pre-defined, it is the outcome of a search in some parameter space. The difficulty here is that the question implies a potentially large multi-comparison analysis, and classical statistics has no general effective method to evaluate the confidence level of such estimate from a sample. We present computationally efficient methods for evaluating the sample complexity (error bound for a given sample size) in specific domains, relevant to massive data mining applications. Our work will build on the classic concept of uniform convergence, computed through explicit Rademacher average bounds. Surprisingly, we demonstrate that our methods, that have provable statistical guarantees, are more computationally efficient than known heuristics that have no guarantees on the quality of their solutions.

Affiliazione:

Dept. of Computer Science, Brown University (RI), USA

Collegamenti

Department

Courses

Research

Services

DEI - UNIPD

UNIPD Store