We'll use the AWS CLI tool to interact with resources in our AWS account.
Use our system's package manager to install the aws-cli
package. (On some distributions, this may be called awscli
.)
After installation, test by opening a terminal window and typing the following command.
$ aws --version
We should see something similar to
aws-cli/1.16.263 Python/3.7.4 Linux/4.19.84-1-MANJARO botocore/1.12.253
.
In AWS console, create an IAM user that we will use to deploy to AWS. Do not use your root account. Grant the user sufficient permissions — e.g., add it to the AdministratorAccess
group.
Create an IAM access key for the user, and store it in a safe place. Follow the instructions to configure a profile for AWS CLI.
$ aws configure --profile <profile-name>
We'll need to create the VPC infrastructure for our Kubernetes cluster. Use a nice template.
pushd ./vpc
bash ./deploy-vpc.sh
popd
There are some instructions.
We might need to install a few packages along the way, such as binutils
and gcc
. This should also install kubectl
, but we should verify that it is installed with the package manager. Also verify that aws-iam-authenticator
is installed. On Arch linux, there is an AUR package.
Unfortunately, AWS is limited to RSA keys; otherwise, we'd use the more secure Ed52219 algorithm.
ssh-keygen
Now let's create create the cluster.
pushd ./eksctl
bash deploy-cluster.sh
popd
Best practice is not to reuse the key that eksctl
/kubectl
uses to talk to the Kubernetes cluster, but to create a separate key. Fortunately, GitHub supports the more secure Ed25519 key.
ssh-keygen -t ed25519 -f ~/.ssh/kubernetes_github_ed25519
Add the public key to your GitHub profile settings.
Create a repository to contain the manifests that define the Kubernetes cluster. Call it e.g. kubernetes-gitops
. We can go ahead and clone this empty repo to our workstation.
pushd ~/code/repos
GIT_SSH_COMMAND='ssh -i ~/.ssh/kubernetes_github_ed25519' \
git clone [email protected]:dustbort/kubernetes-gitops.git
popd
pushd ./eksctl
bash ./enable-repo.sh
popd
Now, take the key that was outputted to the terminal, and add it as a deployment key for the GitHub repository, with write access. Shortly, we will see that a commit has been made to the flux
directory.
We will use the enabled GitOps to quickly set up the cluster.
pushd ./eksctl
bash ./enable-profile.sh
popd
Again, we can check the gitops repo and see that a commit has been made to the base
directory. After some time, we will see that the manifests have been applied to the cluster via gitops. Wait for everything to reach the Running
state.
AWS_PROFILE=dustbort kubectl get all --all-namespaces
The quick start includes the Kubernetes dashboard. To authenticate with the dashboard, we will need to set up eks-admin
. Pull the latest version of the gitops repo. Then add the following file:
eks-admin/eks-admin-service-account.yaml
:
apiVersion: v1
kind: ServiceAccount
metadata:
name: eks-admin
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: eks-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: eks-admin
namespace: kube-system
Commit and push the repo. Flux will detect the changes and apply them to the cluster after a short while.
Then we can get a token to connect to the dashboard. (We will know that eks-admin has not been applied yet if the following command spits out a bunch of keys, instead of a single matching key.)
kubectl -n kube-system describe secret $(kubectl -n kube-system get secret | grep eks-admin | awk '{print $1}')
We use kubectl
to proxy a connection to the cluster.
AWS_PROFILE=dustbort kubectl proxy
Then navigate to the dashboard and use the token to access.
Because there are many dependencies and configurations for Airflow on Kubernetes, we will use a standard helm chart.
Fetch the heml chart for Airflow and copy the values.
pushd ./helm
bash ./fetch.sh airflow
bash ./values.sh airflow
popd
Edit ./helm/values/airflow/values.yaml
and set the following values:
Property | Value |
---|---|
airflow.image.repository | datarods/docker-airflow |
airflow.image.tag | 1.10.4-2 |
After setting the values, we can render the manifest from the helm chart templates.
pushd ./helm
bash manifest.sh airflow airflow airflow
popd
Now we are ready to put the manifests into the Kubernetes gitops repo.
rm -rf ../kubernetes-gitops/airflow
cp -a ./helm/manifests/airflow ../kubernetes-gitops/airflow
Commit and push the repo. Soon, Airflow will be active in the cluster.
Once Airflow appears in the list of running services, we can connect to its web interface. First, forward a port on our computer to the port on which the airflow-web
is running.
AWS_PROFILE=dustbort kubectl port-forward service/airflow-web 8080:8080 -n default
Now visit http://localhost:8080.
Right away, there is a bug in the configuration. From the menu, navigate to Admin > Connections. Click the edit icon for the airflow_db
connection. Make the following settings:
Property | Value |
---|---|
Conn Type | Postgres |
Host | airflow-postgresql |
Login | postgres |
Password | airflow |
Port | 5432 |
Now, from the menu, navigate to Data Profiling > Charts. Then select the item Airflow task instance by type. If we don't see a database connection error, then the airflow_db
connection is configured correctly.
Best practice is not to reuse the key that eksctl
/kubectl
uses to talk to the Kubernetes cluster, but to create a separate key. Fortunately, GitHub supports the more secure Ed25519 key.
ssh-keygen -t ed25519 -f ~/.ssh/airflow_github_ed25519
Add the public key to your GitHub profile settings.
Create a repository to contain the DAGs for Airflow. Call it e.g. airflow-dags-gitops
. We can go ahead and clone this empty repo to our workstation.
pushd ~/code/repos
GIT_SSH_COMMAND='ssh -i ~/.ssh/airflow_github_ed25519' \
git clone [email protected]:dustbort/airflow-dags-gitops.git
popd
Create a known hosts file for Airflow, containing the entry for github.com
. We can copy this entry from our known_hosts
file.
ssh-keyscan github.com > ~/.ssh/airflow_known_hosts
Create a Kubernetes secret that conforms to the format that Airflow git-sync requires.
AWS_PROFILE=dustbort \
kubectl create secret generic airflow-github-secrets \
--from-file=id_ed25519=$HOME/.ssh/airflow_github_ed25519 \
--from-file=id_id_ed25519=$HOME/.ssh/airflow_github_ed25519.pub \
--from-file=known_hosts=$HOME/.ssh/airflow_known_hosts
Edit ./helm/values/airflow/values.yaml
and set the following values:
Property | Value |
---|---|
dags.git.url | [email protected]:dustbort/airflow-dags-gitops.git |
dags.git.secret | airflow-github-secrets |
dags.initContainer.enabled | true |
After setting the values, we can render the manifest from the helm chart templates.
pushd ./helm
bash manifest.sh airflow airflow airflow
popd
Now we are ready to put the manifests into the Kubernetes gitops repo.
rm -rf ../kubernetes-gitops/airflow
cp -a ./helm/manifests/airflow ../kubernetes-gitops/airflow
Commit and push the repo. Soon, Airflow will be active in the cluster.
Once Airflow appears in the list of running services, we can connect to its web interface. First, forward a port on our computer to the port on which the airflow-web
is running.
AWS_PROFILE=dustbort kubectl port-forward service/airflow-web 8080:8080 -n default
If the repo is totally empty, when Airflow tries to pull the DAGs, then the master
branch will not exist, causing the gitops script to crash the pods. So, let's make an initial commit. Our DAGs repo should be a proper python module, so we can at least add an empty __init__.py
file. In addition, we might add a DAG just for testing. We can use one such as this. Commit and push the repo. In Airflow web interface, from the menu, navigate to DAGs. After a few minutes, when we refresh the page, we should see the DAGs that we pushed will appear in the list. We can test by running the DAGs from the web interface.