Big Data Computing (6 CFU - 48h)


Introduction to the course

  • 26/02/2018 Introduction and organization of the course.
  1. Slides: Introduction

Computational Frameworks: MapReduce/Spark

  • 26/02/2018 Big data challenges. MapReduce: introduction to the framework; typical platform architecture; MapReduce computation.
  • 27/02/2018 MapReduce: Specification of MapReduce algorithms and their execution on a distributed system with a fault-tolerant distributed file system; Key performance indicators for the analysis MapReduce algorithms. Basic MapReduce primitives/techniques: word count.
  • 06/03/2018 Basic MapReduce primitives/techniques: partitioning (improved Word count and Category counting); trading accurcay for efficiency (Maximum pairwise distance). Tools: Chernoff bound.
  • 07/03/2018 Exercise: exact maximum pairwise distance in constant rounds, sublinear local space and quadratic aggregate space. Basic MapReduce primitives/techniques: sampling (SampleSort)
  • 13/03/2018 Basic MapReduce primitives/techniques: analysis of SampleSort. Description of Homework 1.
  • 14/03/2018 Introduction to Apache Spark: software architecture; notions of driver/executor processes, context, cluster manager; RDD (partitioning, lineage, caching, operations). Exercises 2.3.1.(b) and 2.3.1.(d) from J. Leskovec, A. Rajaraman and J. Ullman. Mining Massive Datasets. Cambridge University Press, 2014.

  1. Slides: Computational Frameworks - MapReduce
  2. Slides: Apache Spark Fundamentals (final with errata)



  1. Slides: Clustering (Part 1)

Last update: 19/03/2018 Back to home page