Data e Ora: 
Monday, June 6, 2016 - 15:00
Luogo: 
Aula Magna “A. Lepschy”
Relatore: 
Prof. Eli Upfal
Descrizione: 

ABSTRACT: What sample size is needed to accurately estimate a parameter? For a given parameter, the answer is straightforward. But in most advance data analysis the parameter of interest is not pre-defined, it is the outcome of a search in some parameter space. The difficulty here is that the question implies a potentially large multi-comparison analysis, and classical statistics has no general effective method to evaluate the confidence level of such estimate from a sample. We present computationally efficient methods for evaluating the sample complexity (error bound for a given sample size) in specific domains, relevant to massive data mining applications. Our work will build on the classic concept of uniform convergence, computed through explicit Rademacher average bounds. Surprisingly, we demonstrate that our methods, that have provable statistical guarantees, are more computationally efficient than known heuristics that have no guarantees on the quality of their solutions.

Affiliazione: 
Dept. of Computer Science, Brown University (RI), USA