Skip to content
This repository was archived by the owner on Aug 25, 2021. It is now read-only.

Expose server gossip and RPC ports as hostPorts #740

Merged
merged 1 commit into from
Dec 18, 2020

Conversation

ndhanushkodi
Copy link
Contributor

@ndhanushkodi ndhanushkodi commented Dec 11, 2020

Changes proposed

  • Adds option to expose server gossip and RPC ports as hostPorts
  • Adds option to configure the server gossip port with server.ports.serflan.port. This would be necessary to configure differently when exposing both server.exposeGossipAndRPCPorts and client.exposeGossipPorts to avoid port conflicts when client and servers are running on the same node.

Use Case
To enable a client agent outside of the k8s cluster to join the datacenter, you would need to enable server.exposeGossipAndRPCPorts, client.exposeGossipPorts, and set server.ports.serflan.port to a port not being used on the host. Since client.exposeGossipPorts uses the hostPort 8301, server.ports.serflan.port must be set to something other than 8301, if those client/server pods can be scheduled on the same node.

How I've tested
On GCP:

  1. Deploy GKE cluster
    gcloud container clusters create external-agent --project nitya-293720 --cluster-version="1.17.12-gke.2502" --zone us-west1-a --machine-type=n1-standard-4 --num-nodes 3
  2. Helm install from this PR branch, using the following values:
values.yaml
global:
  domain: consul
  datacenter: dc1
server:
  replicas: 1
  bootstrapExpect: 1
  exposeGossipAndRPCPorts: true
  serflan:
    port: 9301
client:
  enabled: true
  grpc: true
  exposeGossipPorts: true
ui:
  enabled: true
connectInject:
  enabled: true
controller:
  enabled: true
  1. On the GCP console, create a new VM. If you are in the same project and use the same zone, the VM will be deployed to the same subnet as your GKE VMs. The VM and GKE VMs need to be routeable on the private network. I created an E2-small with Debian boot disk.
  2. Add a network tag to the VM (you should see the option when you edit the VM)
  3. Add a firewall rule whose target is the network tag you created above. Allow all ingress on 0.0.0.0/0 on all ports. (I wasn't able to reduce this to just the private network range 10.128.0.0/9 and have it work, and I'm not sure why).
  4. When the VM is created, copy the gcloud command to ssh onto the VM.
  5. Get consul binary (You'll need to apt install wget and unzip)
    wget https://releases.hashicorp.com/consul/1.9.0/consul_1.9.0_linux_amd64.zip && unzip consul_1.9.0_linux_amd64.zip
  6. Run this script to run a local agent, and replace the -advertise ip with the internal ip of your VM. Replace the -retry-join with the status.hostIP on your consul server pod. If you have multiple consul servers, you will need multiple of these lines with each ip:port of a consul server. Create a folder called local/consul/config and local/consul/data.
#! /usr/bin/env bash

./consul agent \
  -advertise="10.138.0.38" \
  -bind=0.0.0.0 \
  -client=0.0.0.0 \
  -hcl='leave_on_terminate = true' \
  -hcl='ports { grpc = 8502 }' \
  -config-dir=local/consul/config \
  -datacenter=dc1 \
  -data-dir=local/consul/data \
  -retry-join="10.138.0.35:9301" \
  -domain=consul
  1. Another (more recommended way) to join the cluster is by copying the kubeconfig file to target the k8s cluster into the home directory on the VM. Then you can replace the retry-join flag above and use: -retry-join 'provider=k8s host_network=true label_selector="app=consul,component=server"' instead rather than hardcoding potentially multiple consul server IPs.
  2. Look at logs on consul-server on k8s and the consul-client on the VM, and ensure no error messages are being printed
  3. Register a service on the client on the VM:
curl --request PUT --data @payload.json http://localhost:8500/v1/agent/service/register
payload.json
{
  "ID": "redis1",
  "Name": "redis",
  "Tags": [
    "primary",
    "v1"
  ],
  "Address": "127.0.0.1",
  "TaggedAddresses": {
    "lan": {
      "address": "127.0.0.1",
      "port": 8000
    }
  },
  "Port": 8000,
  "Check": {
    "CheckID": "service:redis1",
    "Name": "Redis health check",
    "Notes": "TCP health check",
    "ServiceID": "redis1",
    "TCP": "localhost:8888",
    "Interval": "5s",
    "Timeout": "1s",
    "DeregisterCriticalServiceAfter": "600s"
  }
}
  1. On the server, curl http://127.0.0.1:8500/v1/catalog/services and verify you see the service.

How I expect reviewers to test
If at least one reviewer has the bandwidth to run through the complete steps above it would give some confidence in catching any gotchas we might want to document.

@ndhanushkodi ndhanushkodi force-pushed the expose-server-gossip-rpc branch from 683f7dd to adcd357 Compare December 15, 2020 23:43
@ndhanushkodi ndhanushkodi marked this pull request as ready for review December 15, 2020 23:46
@ndhanushkodi ndhanushkodi requested review from lkysow, a team and kschoche and removed request for a team December 15, 2020 23:47
Copy link
Member

@lkysow lkysow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! I haven't tried it out yet but have some minor code comments.

@ndhanushkodi ndhanushkodi force-pushed the expose-server-gossip-rpc branch from adcd357 to dc72814 Compare December 17, 2020 07:26
Copy link
Contributor

@kschoche kschoche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work!!

It is probably unnecessary since you have the bats tests, but I'm curious if it would make any sense to add an acceptance test which covers setting these flags in an environment which would consume them?

Copy link
Member

@lkysow lkysow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work. I didn't test with a VM but I did test on k8s and ensured that the ports and ips were as expected.

@lkysow
Copy link
Member

lkysow commented Dec 17, 2020

Will also need changelog and helm docs updates :D

@ndhanushkodi ndhanushkodi force-pushed the expose-server-gossip-rpc branch from dc72814 to 664b8ce Compare December 17, 2020 21:31
@ndhanushkodi
Copy link
Contributor Author

ndhanushkodi commented Dec 17, 2020

It is probably unnecessary since you have the bats tests, but I'm curious if it would make any sense to add an acceptance test which covers setting these flags in an environment which would consume them?

@kschoche It would involve setting up some VM infrastructure in our tests and probably passing some ssh keys in which seems doable and maybe worth it if we feel like there's a gap in our automated testing. Maybe a criteria we could use to decide when to prioritize writing an acceptance test is if a bug comes up involving external clients/servers?

@kschoche
Copy link
Contributor

Awesome work!!
It is probably unnecessary since you have the bats tests, but I'm curious if it would make any sense to add an acceptance test which covers setting these flags in an environment which would consume them?

@kschoche It would involve setting up some VM infrastructure in our tests and probably passing some ssh keys in which seems doable and maybe worth it if we feel like there's a gap in our automated testing. Maybe a criteria we could use to decide when to prioritize writing an acceptance test is if a bug comes up involving external clients/servers?

Yeah, mostly curious what you thought on it! I looked into it briefly this morning, and there's some tf bits we could grab from hashicorp/consul that would probably do the trick. Right now we have no automated testing which involves VM / k8s federation, but I suppose that is a configuration which is much more reliant on the actual configuration than whether or not one of the dc are VM or not, so an automated test would be kinda moot.

Great work!

@ndhanushkodi ndhanushkodi force-pushed the expose-server-gossip-rpc branch 2 times, most recently from 6031a41 to e4b4f9b Compare December 17, 2020 21:57
@ndhanushkodi ndhanushkodi force-pushed the expose-server-gossip-rpc branch 2 times, most recently from b104b73 to 744b85c Compare December 17, 2020 22:50
To enable a client agent outside of the k8s cluster to join the
datacenter, you would need to enable server.exposeGossipAndRPCPorts,
client.exposeGossipPorts, and set server.ports.serflan.port to a port
not being used on the host. Since client.exposeGossipPorts uses the
hostPort 8301, server.ports.serflan.port must be set to something other
than 8301.

The client agent VM outside of the k8s cluster would need to be able to
route to the private IP of the VMs in the k8s cluster to join the
datacenter and the VMs in the k8s cluster would need to be able to route
to the client agent VM outside the k8s cluster as well on its advertised
IP.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants