|
| 1 | +# The official repository for the Rock the JVM Spark Optimization with Scala course |
| 2 | + |
| 3 | +Powered by [Rock the JVM!](rockthejvm.com) |
| 4 | + |
| 5 | +This repository contains the code we wrote during [Rock the JVM's Spark Optimization with Scala](https://rockthejvm.com/course/spark-optimization) course. Unless explicitly mentioned, the code in this repository is exactly what was caught on camera. |
| 6 | + |
| 7 | +### Install and setup |
| 8 | + |
| 9 | +- install [IntelliJ IDEA](https://jetbrains.com/idea) |
| 10 | +- install [Docker Desktop](https://docker.com) |
| 11 | +- either clone the repo or download as zip |
| 12 | +- open with IntelliJ as an SBT project |
| 13 | + |
| 14 | +As you open the project, the IDE will take care to download and apply the appropriate library dependencies. |
| 15 | + |
| 16 | +To set up the dockerized Spark cluster we will be using in the course, do the following: |
| 17 | + |
| 18 | +- open a terminal and navigate to `spark-cluster` |
| 19 | +- run `build-images.sh` (if you don't have a bash terminal, just open the file and run each line one by one) |
| 20 | +- run `docker-compose up` |
| 21 | + |
| 22 | +To interact with the Spark cluster, the folders `data` and `apps` inside the `spark-cluster` folder are mounted onto the Docker containers under `/opt/spark-data` and `/opt/spark-apps` respectively. |
| 23 | + |
| 24 | +To run a Spark shell, first run `docker-compose up` inside the `spark-cluster` directory, then in another terminal, do |
| 25 | + |
| 26 | +``` |
| 27 | +docker exec -it spark-cluster_spark-master_1 bash |
| 28 | +``` |
| 29 | + |
| 30 | +and then |
| 31 | + |
| 32 | +``` |
| 33 | +/spark/bin/spark-shell |
| 34 | +``` |
| 35 | + |
| 36 | +### How to use intermediate states of this repository |
| 37 | + |
| 38 | +Start by cloning this repository and checkout the `start` tag: |
| 39 | + |
| 40 | +``` |
| 41 | +git checkout start |
| 42 | +``` |
| 43 | + |
| 44 | +### How to run an intermediate state |
| 45 | + |
| 46 | +The repository was built while recording the lectures. Prior to each lecture, I tagged each commit so you can easily go back to an earlier state of the repo! |
| 47 | + |
| 48 | +The tags are as follows: |
| 49 | + |
| 50 | +* `start` |
| 51 | +* `1.1-scala-recap` |
| 52 | +* `1.2-spark-recap` |
| 53 | +* `2.2-spark-job-anatomy` |
| 54 | +* `2.3-query-plans` |
| 55 | +* `2.3-query-plans-exercises` |
| 56 | +* `2.4-spark-ui` |
| 57 | +* `2.5-spark-apis` |
| 58 | +* `2.6-deploy-config` |
| 59 | +* `3.1-join-mechanics` |
| 60 | +* `3.2-broadcast-joins` |
| 61 | +* `3.3-column-pruning` |
| 62 | +* `3.4-prepartitioning` |
| 63 | +* `3.5-bucketing` |
| 64 | +* `3.6-skewed-joins` |
| 65 | +* `4.1-rdd-joins` |
| 66 | +* `4.2-cogroup` |
| 67 | +* `4.3-rdd-broadcast` |
| 68 | +* `4.4-rdd-skews` |
| 69 | +* `5.1-rdd-transformations` |
| 70 | +* `5.2-by-key-ops` |
| 71 | +* `5.3-reusing-objects` |
| 72 | +* `5.5-i2i-transformations` |
| 73 | +* `5.6-i2i-transformations-exercises` |
| 74 | + |
| 75 | +When you watch a lecture, you can `git checkout` the appropriate tag and the repo will go back to the exact code I had when I started the lecture. |
| 76 | + |
| 77 | +### For questions or suggestions |
| 78 | + |
| 79 | +If you have changes to suggest to this repo, either |
| 80 | +- submit a GitHub issue |
| 81 | +- tell me in the course Q/A forum |
| 82 | +- submit a pull request! |
0 commit comments