As part of trying out Kubernetes, we are going to set
up etcd
!
etcd
is a “distributed reliable key-value store for the most critical data of a distributed system “
This is where Kubernetes state lives, given how important that is and how it seems to make sense to run it separated from the Kubernetes cluster for High Availability, we will be doing just that, using WMF’s etcd guide as inspiration for decisions that are important.
According to etcd’s Operations documentation at least 8G RAM is recommended for each node, and 2G should be the bare minimum, so we’ll change that for etcd hosts to something in between:
# Changing in VM_BHYVE/k8s-etcd-1/k8s-etcd-1.conf
memory=4G
The limiting factor here will likely be the mechanical hard drives :-), but this is the hardware we have.
Since we are using Debian and there is (so far?) no good reason here to use
other binaries, we’ll just: apt-get install etcd-server
(currently v3.4.23, that’s new enough, it defaults to v3
of the protocol).
Since this is a small experiment, we could start the cluster with statically-defined members and that would be a breeze to set up as it is just a matter of passing the right CLI arguments / environment variables.
But we want to try out DNS discovery.
I’ll use discovery-srv=lab.evilham.com
and discovery-srv-name=c1
.
The c1
serves to tell this cluster apart from others which could exist
in the future :-).
Which means we need to set up following DNS entries for this cluster with my cdist types:
# This is knot client syntax
zone-unset evilham.com lab
## Server SRV records
zone-set evilham.com _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-1.c1.lab
zone-set evilham.com _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-2.c1.lab
zone-set evilham.com _etcd-server-ssl-c1._tcp.lab 300 SRV 0 0 2380 k8s-etcd-3.c1.lab
## Client SRV records
zone-set evilham.com _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-1.c1.lab
zone-set evilham.com _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-2.c1.lab
zone-set evilham.com _etcd-client-c1._tcp.lab 300 SRV 0 0 2379 k8s-etcd-3.c1.lab
## Actual host definitions
zone-set evilham.com k8s-etcd-1.c1.lab 3600 AAAA ${_k8s_v6}:1::21
zone-set evilham.com k8s-etcd-1.c1.lab 3600 A 10.2.1.21
zone-set evilham.com k8s-etcd-2.c1.lab 3600 AAAA ${_k8s_v6}:1::22
zone-set evilham.com k8s-etcd-2.c1.lab 3600 A 10.2.1.22
zone-set evilham.com k8s-etcd-3.c1.lab 3600 AAAA ${_k8s_v6}:1::23
zone-set evilham.com k8s-etcd-3.c1.lab 3600 A 10.2.1.2
I was not going to set up TLS for the cluster peers yet because traffic was internal and I run my own DNS. However! Life happened, computers happened. Apparently documentation for 3.4.X is actually wrong and this is not really supported. In theory TLS SRV entries should be tried, then non-TLS entries, in reality cluster members bail out as soon as fetching TLS SRV records fails.
An advantage of systematically allocating IPs and hostnames is that our provisioning (with cdist here) gets easier:
#!/bin/sh -eu
case "${__target_host}" in
# Expeted to be: peer-name.cluster-name.lab.evilham.com
k8s-etcd-*.lab.evilham.com)
__hostname --name "${__target_host}"
ETC_DIR="/etc" # targeting linux only atm
# Install etcd package
__package etcd-server
# Prepare data from hostname
discovery_srv_name="$(echo "${__target_host}" | cut -d '.' -f 2)"
discovery_srv="$(echo "${__target_host}" | cut -d '.' -f 3-)"
# Will look like: eh-etcd-c1
initial_cluster_token="eh-etcd-${discovery_srv_name}"
require="__package/etcd-server" __file /etc/default/etcd \
--onchange "service etcd restart" \
--source - <<EOF
# Managed remotely, changes will be lost!
ETCD_NAME='${__target_host}'
# DNS discovery
ETCD_DISCOVERY_SRV='${discovery_srv}'
ETCD_DISCOVERY_SRV_NAME='${discovery_srv_name}'
# Peers use TLS
ETCD_INITIAL_ADVERTISE_PEER_URLS='https://${__target_host}:2380'
ETCD_LISTEN_PEER_URLS='https://[::]:2380'
# Clients do not
ETCD_LISTEN_CLIENT_URLS='http://[::]:2379'
ETCD_ADVERTISE_CLIENT_URLS='http://${__target_host}:2379'
# Setup TLS for peers
# This provides encryption but not authentication of peers!
ETCD_PEER_AUTO_TLS="true"
# We deploy this first with _ETCD_STATE defined as 'new', so the
# cluster gets created. On successive runs, the state will be 'existing'.
DAEMON_ARGS="--initial-cluster-token '${initial_cluster_token}' \
--initial-cluster-state '${_ETCD_STATE:-existing}''"
EOF
;;
esac
So, now we apply this manifest on all VMs on the cluster:
env _ETCD_STATE=new cdist config -i manifest/etcd k8s-etcd-{1,2,3}.c1.lab.evilham.com
[...]
VERBOSE: [5341]: config: Total processing time for 3 host(s): 33.292810678482056
We will install etcd-client
on the control-plane VM, which has access to the
network, and see how things behave.
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
put "Gabon" "Bona nit"
OK
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
get "Gabon"
Gabon
Bona nit
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
--write-out=fields get "Gabon"
"ClusterID" : 17681054088137536936
"MemberID" : 9829860431962910583
"Revision" : 26834
"RaftTerm" : 61940
"Key" : "Gabon"
"CreateRevision" : 26833
"ModRevision" : 26834
"Version" : 2
"Value" : "Bona nit"
"Lease" : 0
"More" : false
"Count" : 1
Cute! It seems to be working.
Since there is no data, I want to see what happens when I kill the leader, first I have to find out who it is:
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' --write-out=table \
endpoint status --cluster
It seems like k8s-etcd-2
is the chosen one (IS LEADER
) right now, let’s power it off.
Repeating the previous command is not as quick as before and I get a warning
context deadline exceeded
, corresponding to k8s-etcd-2
being down.
Interestingly, there is a new leader in town, k8s-etcd-1
!
Not for long, let’s kill it too.
Now, the cluster shows an error etcdserver: no leader
.
This makes sense, it knows about two other peers, and it can’t decide to be the
leader on its own; there could very well be a network split and that’d get messy
when the two parts can talk to each other!
This is why an etcd
cluster has 2*n+1
peers!
What if I try to add something to the cluster again?
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
put "Egun on" "Bon dia"
... context deadline exceeded ...
So, it turned read-only? good! Can I still query it?
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
get "Gabon"
... context deadline exceeded ...
Nope! This is also good, in this context, it is better to have no answer than a potentially wrong one.
Let’s revive k8s-etcd-2
and check the cluster status again.
Decent, with two members an election could happen and k8s-etcd-3
is the new
leader.
Now we should be able to teach our etcd to say good morning:
$ etcdctl --discovery-srv='lab.evilham.com' --discovery-srv-name='c1' \
put "Egun on" "Bon dia"
OK
Writing works again, as does reading; good!
Now, if we bring back k8s-etcd-1
, it should catch up and magically learn to
say good morning.
$ etcdctl --endpoints=http://k8s-etcd-1.c1...:2379 get "Egun on"
Egun on
Bon dia
Wonderful!
Deploying etcd
in a cluster is not terribly difficult (modulo misleading docs
the implied inherent pain of running our own internal CA in production),
and the effort of running the cluster outside of kubernetes is probably worth
it because our kubernetes cluster state can be safer.
Further things to check here are backups and restore from the cluster. On that note, I like this quote from WMF’s wikitech:
In the sad case when RAFT consensus is lost and there is no quorum anymore, the only way to recover the cluster is to recover the data from a backup, which are regularly performed every night in /srv/backups/etcd. The procedure to bring back the cluster is roughly as follows […]
Aka: do not panic, breath in, know your emergency plans, check the emergency plan, and apply it.
Now that etcd
is set up, we can actually get on with kubernetes!
These were the steps according to the plan:
etcd
cluster