Thesis' proposals

This list is incomplete. If you are interested in my research topics (see here ), contact me for more information (here a few keywords describing my work: algorithms, data structures, similarity search, pattern extractions, parallel algorithms,I/O efficient algorithms…)


Proposals for bachelor students (“Laurea Triennale”)

Please contact me if you are interested. Thesis topics: algorithms, data structures, similarity search, pattern extractions, parallel algorithms,I/O efficient algorithms…


Proposals for master students (“Laurea Magistrale”)

Clustering of time series

Time series analysis is rapidly establishing itself as a major discovery tool in data science. Every day huge amounts of time series are generated in different application domains such as finance, mobile communications and computing, sensor networks, among the others. An important tool in data science is clustering: its goal is to partition the input into well-separated subsets (clusters) of similar data, so to uncover hidden and unknown structure, and to summarize the data.

The goal of this thesis is to design and analyze parallel algorithms for clustering of time series, building on recent results on parallel algorithms for clustering of vectors.


Distance Sensitive Hashing

Locality Sensitive Hashing (LSH) is a powerful tool for high dimensional similarity search: LSH is an hashing schema where near points collide with higher probability than far points, and it is usually the case that the collision probability equals the similarity between the two points. Recently, the paper [1] introduced the notion of Distance Sensitive Hashing (DSH): this is a generalization of LSH where the collision probability can be an arbitrary function of the collision probability. For instance, the collision probability can peak at a given distance threshold larger than 0, or can decrease with the similarity. DSH are relevant for improving the quality of search engines and for improving privacy in similarity search tools.

The goal of this thesis is to study existing DSH schemas, to extend them to other distance functions, and to perform an experimental evaluation of their properties.

References: [1] M. Aumuller, T. Christiani, R. Pagh, F. Silvestri. Distance-sensitive hashing. In Proc. 37th Symposium on Principles of Database Systems (PODS), 2018.


Parallel algorithms for similarity join

Consider two sets R and S of high dimensional points (e.g., user profiles, images) and let d(,) be a distance function (e.g., Euclidean distance). The similarity join of R and S consists in finding all near points between R and S, that is all pairs (r,s) with r in R, s in S, and the distance between r and s smaller than a given threshold. Similarity join is used for duplicate removal, data cleaning, record linkage.

The goal of this master thesis is to develop and analyze parallel algorithms for computing the similarity join of two sets. The starting points are two recent results on similarity search in parallel settings [1,2]. All algorithms will be implemented in the Spark framework, which is commonly used for developing parallel algorithms for big data.

References:

  1. Samuel McCauley and Francesco Silvestri. Adaptive MapReduce Similarity Join. Workshop on Algorithms and Systems for MapReduce and Beyond, 2018.
  2. Xiao Hu, Yufei Tao and Ke Yi. Output-Optimal Parallel Algorithms for Similarity Joins. Proc. of Symposium on Principles of Database Systems, 2017.

Algorithms for Tensor Processing Units

The goal of this master thesis is to develop and analyse algorithms for Tensor Processing Units (TPUs). A TPU is a computing unit optimised for deep learning, which features special hardware for dense matrix multiplication. The goal of this thesis is to show that TPU can be used for efficiently solving sparse problems. In this thesis, the student will study some recent parallel algorithms (see e.g., [1]) for sparse problems and adapt them to TPUs. The algorithms will then be tested on a NVIDIA Volta board [2]. The thesis will be carried out in collaboration with Dr. Flavio Vella from Dividiti Ltd.

References:

  1. J. Kepner and J. Gilbert. Graph Algorithms in the Language of Linear Algebra. SIAM, 2011

  2. Nvidia TPU: https://devblogs.nvidia.com/programming-tensor-cores-cuda-9/