Data Mining and Machine Learning

Research activities:

Nowadays, large amounts of data are produced in a wide spectrum of domains. The effective exploitation of this data, reckoned as one of the most important scientific challenges of the 21st century, requires a sharp paradigm shift with respect to traditional computing. Our research concerns the design, analysis, and engineering of algorithmic techniques for dealing with big data, exploring tradeoffs between computation efficiency and solution quality.
Specific areas of investigation are listed below.

  • Graph analytics: computation of key properties in large graphs (e.g., diameter, node centralities, connectivity, clustering)
  • Mining of statistically significant patterns: discovery of unexpected frequent patterns in large datasets (e.g., market basket analysis, recommendation systems)
  • Combinatorial searches: design of combinatorial search strategies with provable space-time tradeoffs
  • Analysis of dynamic and evolving graphs: efficient computations for time-changing and uncertain graphs (e.g., social networks, biological graphs)
  • Efficient enumeration of patterns in graphs
  • Algorithms for MapReduce/Spark


People: Andrea Pietracaprina (contact person), Geppino Pucci, Francesco Silvestri, Fabio Vandin

The research in Information Access and Search Engines is concerned with the design, the integration and the evaluation of advanced, scalable systems based on theoretical statistical and probabilistic methods. The combination of theoretical statistical and probabilistic methods, either classical or not classical (e.g. quantum), aims at evolving search engines to more advanced and complex search machines. The group is currently active in:

  • Development of novel algorithms and systems for learning to understand multimodal context (e.g. multimedia data and end users information needs)
  • Design and evaluation of methods for predicting factors (e.g. user's relevance and satisfaction) that drive the effectiveness and the efficiency of search engines.

People: Massimo Melucci (contact person)

  • Bagging algorithms for multivariate selection of features with local similarity
  •  Mining of probabilistic dependencies between features

People: Silvana Badaloni (contact person)

  • Biometric systems, to study novel approaches for biometric encryption, to develop new biometric systems (several biometric characteristics are studied: fingerprint/face/palm/knuckleprint/on-line signature/iris/hand identification).
  •  Bioinformatics, to develop new methods for protein, peptide and gene classification, data mining for medical diagnosis. These tools are very useful for speeding the process of developing new drugs.
  •  Computer vision, to develop new systems for artificial computer vision e.g. texture descriptors, bag of words based approaches. Some applications: are scene recognition, building recognition, object recognition, pedestrian detector.
  •  Machine Learning, new methods for building ensemble of classifiers (artificial intelligence systems based on neural networks et similia that learn from examples).

Papers are available at:

Available code:

People: Loris Nanni (contact person)

Similarity search concerns with finding similar objects in large datasets, and it plays a crucial role in data mining, machine learning, statistical estimation, information retrieval, and pattern recognition. Similarity search however is resource intensive due to the large size of datasets and to the high dimensionality of objects.

Some areas of investigation include:

  • Approximate and randomized algorithms for similarity search in high dimensional spaces (e.g., locality sensitive hashing, distance sensitive filters);
  • Similarity search for complex objects (e.g., trajectories, curves, strings).

People: Francesco Silvestri (conctact person)