Provant kubernetes: etcd

Introduction

As part of trying out Kubernetes, we are going to set up etcd!

etcd is a “distributed reliable key-value store for the most critical data of a distributed system “

This is where Kubernetes state lives, given how important that is and how it seems to make sense to run it separated from the Kubernetes cluster for High Availability, we will be doing just that, using WMF’s etcd guide as inspiration for decisions that are important.

Introduction
Table of contents
VM Resources
Installing
Setting up the cluster
- DNS discovery
- Provisioning parenthesis
Testing the cluster
- Basic functioning
- Killing cluster members
Conclusion

VM Resources

According to etcd’s Operations documentation at least 8G RAM is recommended for each node, and 2G should be the bare minimum, so we’ll change that for etcd hosts to something in between:

# Changing in VM_BHYVE/k8s-etcd-1/k8s-etcd-1.conf
memory=4G

The limiting factor here will likely be the mechanical hard drives :-), but this is the hardware we have.

Installing

Since we are using Debian and there is (so far?) no good reason here to use other binaries, we’ll just: apt-get install etcd-server (currently v3.4.23, that’s new enough, it defaults to v3 of the protocol).

Setting up the cluster

Since this is a small experiment, we could start the cluster with statically-defined members and that would be a breeze to set up as it is just a matter of passing the right CLI arguments / environment variables.

But we want to try out DNS discovery.

DNS discovery

I’ll use discovery-srv=lab.evilham.com and discovery-srv-name=c1. The c1 serves to tell this cluster apart from others which could exist in the future :-).

Which means we need to set up following DNS entries for this cluster with my cdist types:

# This is knot client syntax
zone-unset evilham.com lab

## Server SRV records
zone-set   evilham.com     _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-1.c1.lab
zone-set   evilham.com     _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-2.c1.lab
zone-set   evilham.com     _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-3.c1.lab

## Client SRV records
zone-set   evilham.com     _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-1.c1.lab
zone-set   evilham.com     _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-2.c1.lab
zone-set   evilham.com     _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-3.c1.lab

## Actual host definitions
zone-set   evilham.com     k8s-etcd-1.c1.lab       3600    AAAA    ${_k8s_v6}:1::21
zone-set   evilham.com     k8s-etcd-1.c1.lab       3600    A       10.2.1.21
zone-set   evilham.com     k8s-etcd-2.c1.lab       3600    AAAA    ${_k8s_v6}:1::22
zone-set   evilham.com     k8s-etcd-2.c1.lab       3600    A       10.2.1.22
zone-set   evilham.com     k8s-etcd-3.c1.lab       3600    AAAA    ${_k8s_v6}:1::23
zone-set   evilham.com     k8s-etcd-3.c1.lab       3600    A       10.2.1.2

I was not going to set up TLS for the cluster peers yet because traffic was internal and I run my own DNS. However! Life happened, computers happened. Apparently documentation for 3.4.X is actually wrong and this is not really supported. In theory TLS SRV entries should be tried, then non-TLS entries, in reality cluster members bail out as soon as fetching TLS SRV records fails.

Provisioning parenthesis

An advantage of systematically allocating IPs and hostnames is that our provisioning (with cdist here) gets easier:

#!/bin/sh -eu

case "${__target_host}" in
    # Expeted to be: peer-name.cluster-name.lab.evilham.com
    k8s-etcd-*.lab.evilham.com)
        __hostname --name "${__target_host}"
        ETC_DIR="/etc"  # targeting linux only atm
        # Install etcd package
        __package etcd-server
        # Prepare data from hostname
        discovery_srv_name="$(echo "${__target_host}" | cut -d '.' -f 2)"
        discovery_srv="$(echo "${__target_host}" | cut -d '.' -f 3-)"
        #   Will look like: eh-etcd-c1
        initial_cluster_token="eh-etcd-${discovery_srv_name}"
        require="__package/etcd-server" __file /etc/default/etcd \
            --onchange "service etcd restart" \
            --source - <<EOF
# Managed remotely, changes will be lost!

ETCD_NAME='${__target_host}'

# DNS discovery
ETCD_DISCOVERY_SRV='${discovery_srv}'
ETCD_DISCOVERY_SRV_NAME='${discovery_srv_name}'

# Peers use TLS
ETCD_INITIAL_ADVERTISE_PEER_URLS='https://${__target_host}:2380'
ETCD_LISTEN_PEER_URLS='https://[::]:2380'

# Clients do not
ETCD_LISTEN_CLIENT_URLS='http://[::]:2379'
ETCD_ADVERTISE_CLIENT_URLS='http://${__target_host}:2379'

# Setup TLS for peers
#   This provides encryption but not authentication of peers!
ETCD_PEER_AUTO_TLS="true"

# We deploy this first with _ETCD_STATE defined as 'new', so the
# cluster gets created. On successive runs, the state will be 'existing'.
DAEMON_ARGS="--initial-cluster-token '${initial_cluster_token}' \
             --initial-cluster-state '${_ETCD_STATE:-existing}''"
EOF
    ;;
esac

So, now we apply this manifest on all VMs on the cluster:

env _ETCD_STATE=new cdist config -i manifest/etcd k8s-etcd-{1,2,3}.c1.lab.evilham.com
[...]
VERBOSE: [5341]: config: Total processing time for 3 host(s): 33.292810678482056

Testing the cluster

We will install etcd-client on the control-plane VM, which has access to the network, and see how things behave.

Basic functioning

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    put "Gabon" "Bona nit"
OK
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    get "Gabon"
Gabon
Bona nit

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    --write-out=fields get "Gabon"
"ClusterID" : 17681054088137536936
"MemberID" : 9829860431962910583
"Revision" : 26834
"RaftTerm" : 61940
"Key" : "Gabon"
"CreateRevision" : 26833
"ModRevision" : 26834
"Version" : 2
"Value" : "Bona nit"
"Lease" : 0
"More" : false
"Count" : 1

Cute! It seems to be working.

Killing cluster members

Since there is no data, I want to see what happens when I kill the leader, first I have to find out who it is:

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' --write-out=table \
    endpoint status --cluster

It seems like k8s-etcd-2 is the chosen one (IS LEADER) right now, let’s power it off.

Repeating the previous command is not as quick as before and I get a warning context deadline exceeded, corresponding to k8s-etcd-2 being down. Interestingly, there is a new leader in town, k8s-etcd-1!

Not for long, let’s kill it too.

Now, the cluster shows an error etcdserver: no leader.

This makes sense, it knows about two other peers, and it can’t decide to be the leader on its own; there could very well be a network split and that’d get messy when the two parts can talk to each other! This is why an etcd cluster has 2*n+1 peers!

What if I try to add something to the cluster again?

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    put "Egun on" "Bon dia"
... context deadline exceeded ...

So, it turned read-only? good! Can I still query it?

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    get "Gabon"
... context deadline exceeded ...

Nope! This is also good, in this context, it is better to have no answer than a potentially wrong one.

Let’s revive k8s-etcd-2 and check the cluster status again.

Decent, with two members an election could happen and k8s-etcd-3 is the new leader.

Now we should be able to teach our etcd to say good morning:

$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
    put "Egun on" "Bon dia"
OK

Writing works again, as does reading; good!

Now, if we bring back k8s-etcd-1, it should catch up and magically learn to say good morning.

$ etcdctl --endpoints=http://k8s-etcd-1.c1...:2379  get "Egun on"
Egun on
Bon dia

Wonderful!

Conclusion

Deploying etcd in a cluster is not terribly difficult (modulo misleading docs the implied inherent pain of running our own internal CA in production), and the effort of running the cluster outside of kubernetes is probably worth it because our kubernetes cluster state can be safer.

Further things to check here are backups and restore from the cluster. On that note, I like this quote from WMF’s wikitech:

In the sad case when RAFT consensus is lost and there is no quorum anymore, the only way to recover the cluster is to recover the data from a backup, which are regularly performed every night in /srv/backups/etcd. The procedure to bring back the cluster is roughly as follows […]

Aka: do not panic, breath in, know your emergency plans, check the emergency plan, and apply it.

Now that etcd is set up, we can actually get on with kubernetes!

These were the steps according to the plan:

[X] we start by setting up the network segments as bridges on the physical host, taking care of NAT, firewall and IPv6 routing
[X] then we set up the etcd cluster
[ ] finally, we actually setup the Kubernetes cluster

Evilham.com