Skip to content

Commit 366947c

Browse files
committedAug 5, 2020
readme
1 parent cfa40bd commit 366947c

File tree

1 file changed

+82
-0
lines changed

1 file changed

+82
-0
lines changed
 

‎README.md

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# The official repository for the Rock the JVM Spark Optimization with Scala course
2+
3+
Powered by [Rock the JVM!](rockthejvm.com)
4+
5+
This repository contains the code we wrote during [Rock the JVM's Spark Optimization with Scala](https://rockthejvm.com/course/spark-optimization) course. Unless explicitly mentioned, the code in this repository is exactly what was caught on camera.
6+
7+
### Install and setup
8+
9+
- install [IntelliJ IDEA](https://jetbrains.com/idea)
10+
- install [Docker Desktop](https://docker.com)
11+
- either clone the repo or download as zip
12+
- open with IntelliJ as an SBT project
13+
14+
As you open the project, the IDE will take care to download and apply the appropriate library dependencies.
15+
16+
To set up the dockerized Spark cluster we will be using in the course, do the following:
17+
18+
- open a terminal and navigate to `spark-cluster`
19+
- run `build-images.sh` (if you don't have a bash terminal, just open the file and run each line one by one)
20+
- run `docker-compose up`
21+
22+
To interact with the Spark cluster, the folders `data` and `apps` inside the `spark-cluster` folder are mounted onto the Docker containers under `/opt/spark-data` and `/opt/spark-apps` respectively.
23+
24+
To run a Spark shell, first run `docker-compose up` inside the `spark-cluster` directory, then in another terminal, do
25+
26+
```
27+
docker exec -it spark-cluster_spark-master_1 bash
28+
```
29+
30+
and then
31+
32+
```
33+
/spark/bin/spark-shell
34+
```
35+
36+
### How to use intermediate states of this repository
37+
38+
Start by cloning this repository and checkout the `start` tag:
39+
40+
```
41+
git checkout start
42+
```
43+
44+
### How to run an intermediate state
45+
46+
The repository was built while recording the lectures. Prior to each lecture, I tagged each commit so you can easily go back to an earlier state of the repo!
47+
48+
The tags are as follows:
49+
50+
* `start`
51+
* `1.1-scala-recap`
52+
* `1.2-spark-recap`
53+
* `2.2-spark-job-anatomy`
54+
* `2.3-query-plans`
55+
* `2.3-query-plans-exercises`
56+
* `2.4-spark-ui`
57+
* `2.5-spark-apis`
58+
* `2.6-deploy-config`
59+
* `3.1-join-mechanics`
60+
* `3.2-broadcast-joins`
61+
* `3.3-column-pruning`
62+
* `3.4-prepartitioning`
63+
* `3.5-bucketing`
64+
* `3.6-skewed-joins`
65+
* `4.1-rdd-joins`
66+
* `4.2-cogroup`
67+
* `4.3-rdd-broadcast`
68+
* `4.4-rdd-skews`
69+
* `5.1-rdd-transformations`
70+
* `5.2-by-key-ops`
71+
* `5.3-reusing-objects`
72+
* `5.5-i2i-transformations`
73+
* `5.6-i2i-transformations-exercises`
74+
75+
When you watch a lecture, you can `git checkout` the appropriate tag and the repo will go back to the exact code I had when I started the lecture.
76+
77+
### For questions or suggestions
78+
79+
If you have changes to suggest to this repo, either
80+
- submit a GitHub issue
81+
- tell me in the course Q/A forum
82+
- submit a pull request!

0 commit comments

Comments
 (0)