Big Data Computing (6 CFU - 48h)


The purpose of the four homeworks is to guide the students through a learning path which starts from getting acquainted with the use of Apache Spark, a popular programming framework for big data processing, to the engineering and testing in Spark of a MapReduce algorithm for an important data analysis task (diversity maximization) on a cluster.

Homeworks are done in groups of up to 4 students. Each homework comes with: an assignment date, a deadline, typically 2-3 weeks after assignment, and the max number of points it gives. Deadlines are soft in the sense that homeworks can be completed after the deadline. However, a group that completes all homeworks within the respective deadlines receives 1 extra point in the final evaluation. For more details, see the page on exam rules. A homework is considered completed within its deadline if the required file(s) are returned by email no later than 23.59 of the deadline date.

Homework 1 (Assigned: 13/03/18 -- Deadline: 27/03/18 -- max 1 pt.)

Acknowledgments: I'm deeply grateful to dr. Matteo Ceccarello who worked hard and competently for setting up the structure of the homeworks, preparing the templates, the data, and all relevant (and extremely useful) material, and configuring the cluster on CloudVeneto, so to make its use in the 4th homework as smooth as possible. Great job!

Last update: 12/03/2018 Back to home page