Machine Setup

Before doing any work, you should setup your machine. First of all, you need to have the Java Development Kit (JDK) version 8 installed on your machine. If you do not, head to Oracle download page and download the Java Development Kit version 8. Version 9 (or leter versions) might give problems, so we should avoid them for the time being.

Instructions for Java users

  1. Create a directory BDC on your computer which will be the main directory for your homeworks. In this directory, put the file build.gradle which you must download here.
  2. If you are working in the Virtual Machine you can skip this step, as the software is already installed. Install Intellij Idea (Community edition), version 2020 on your system from this download page. In the Installation Options dialog window, select 32-bit or 64-bit launcher, depending on your machine, and specify Java in the Create Associations section. (For a more comprehensive guide you can look at the official install and set-up page.)
  3. After installation is completed, you must configure Intellij for a first run. Launch Intellij. In the first startup screen choose not to import any settings. In the second one (about user interface theme) choose "Skip Remaining and Set Defaults". Then, in the third screen select Open and use the file selection dialog that pops up to select the build.gradle file contained in the directory you created in Step 1, as shown in the following screenshot.

    _images/screen1.png

    When prompted, you should click Open as project. You will have to wait a couple of minutes until Intellij configures itself.

  4. Create a directory BDC/src/main/java. As a default, put all of your programs in BDC/src/main/java, and all datasets that you want to provide as inputs to your programs, in the root directory BDC.
  5. Use the project navigation panel on the left to open the files (e.g., programs, datasets, etc.). Open a java program as in the following screenshot

    _images/screen2.png

    If the editor shows errors (in red), this means that there is no Development Kit associated to the project. To fix this problem,

    • Open the menu File/Project Structure and select Project from the left panel. A window like the following one will pop up.

      _images/screen3.png
    • Open the drop-down menu and select the appropriate JDK, as in the figure above. If in doubt, pick Java 8. At this point all errors will disappear.

    If you see no errors in the editor, then you can skip the previous steps.
  6. On the line of the main method there is a green arrow, which allows you to run your code.

    _images/screen4.png

    Clicking the green arrow will compile the code and run it. The run will not succeed (exit code 1 at the bottom of the screen) since you must configure a set of execution parameters. To do so, use the drop-down menu on the top-right of the Intellij window, where the name of the program (TemplateHW1, in the example) appears, and select Edit Configurations to get a dialog window which you must fill as indicated in the following image

    _images/screen5.png

    Note that the VM options field (where the spark.master property is set) may be hidden, and must be retrieved by clicking on the Modify options blue text. Specifically, VM options specify that you want to run Spark in local mode (spark.master is a java system property), while CLI arguments to your application are the arguments passed to the main method (an integer 4 and a file dataset.txt, in the example).

Instructions for Python users

If you are working in the Virtual Machine you can skip this step, as the software is already installed.

Spark also exposes a Python interface. To use Spark from Python on your local machine, follow these instructions, which were created by a colleague of yours in a previous edition of the class.


Last update: 23/04/2020 Back to home page