Kubernetes turned 9 years old a couple months ago, so it’s about time I took a more serious look at it.
Kubernetes is an open-source container orchestration system for automating software deployment, scaling, and management.
I have followed its development somewhat closely and kept up with the concepts and architecture, even if I haven’t jumped to actually using it.
First it was due to waiting for things to get stable, then I decided to let the hype wear off.
Now, hype hasn’t worn off, but it is at least a topic treated with some level of nuance: Kubernetes is great, but it comes with some complexity; complexity that is not necessary in many cases. And so we have to make sure that the benefits actually outweigh the overhead for our particular use-case.
As that point nears and I have some days off, I jump in at learning the best way I know how to: by actually doing things and, this time around, by writing them down for future reference / to better ask for clarification.
Let’s create a “High Availability Kubernetes cluster”! (I guess we’ll [re]define what HA means here a couple times while doing the deed)
Note I do this from a FreeBSD laptop, against a FreeBSD physical host, but the actual Control Plane and Nodes will be running on bhyve Virtual Machines running Linux in that remote FreeBSD physical host. Should not be too relevant, it just makes networking easier =D (debatable, yes).
This will likely take several posts :-).
The main goal here is my own learning and exploring from an Operations perspective, with future quick-reference as a secondary goal.
Particularly: duplicating documentation or getting familiar from a user’s perspective are non-goals right now :-).
It is for these reason that some things are deliberately made more difficult, by e.g. aiming for High Availability and “production-ready” configurations from the beginning, instead of starting with something more simple and self-contained like kind and minicube.
This addresses my way of learning, which includes: reading all/most of the documentation once, gaining a broad overview of what can be done, and what should be done, then implement things as close to production as practical and possible.
Your way of learning may differ, and you may need more incremental progress and immediate feedback; use whatever works for you, if this post helps you, I’ll be happy if you let me know! I do not know many people that share my way of learning :-).
First, let’s compile a list of resources that will help us:
Since we care about the Operations perspective, we jump straight to Kubernetes’ components:
As a temporary simplification: we will run the Control plane on a single Virtual Machine, that will not be allowed to run user containers.
This makes the cluster less redundant, but we get somewhere faster. It also feels like adding this redundancy at a later stage shouldn’t be too difficult; it certainly isn’t our main goal at this stage.
It includes:
kube-apiserver
: which exposes the Kubernetes API. Can be traffic balanced.etcd
: HA key value store, this is where all cluster data lives.
It is a different piece of software used outside of the Kubernetes context.kube-scheduler
: picks up new Pods and assigns them a Node to run
on based on the Pod’s specifications and the cluster’s availability.kube-controller-manager
: performs multiple jobs, abstractly it seems like
it is in charge of reacting to changes in the cluster of various kinds
(like a Pod going down, running one-off tasks by creating their Pods, etc.),
the documentation doesn’t provide an exhaustive list, so this is an
extrapolation of the meaning, and could be somewhat wrong or incomplete.cloud-controller-manager
: this seems to be provider-specific, since we are
running Kubernetes ourselves (the whole point), we will probably not see it.etcd
The documentation for a Highly Available Kubernetes cluster gives us two options:
etcd
etcd
NodesMy systems intuition tells me that mixing Control Plane and state storage may not be the best of decisions, and I have heard horror stories.
I’d be inclined to use an external etcd
for anything put in production.
This is ratified by WMF’s Kubernetes deployment.
These run on each Node:
kubelet
: this makes sure containers from Pods are running and healthykube-proxy
: manages the network rules to be able to provide Services.
Something tells me we’ll spend a lot of time here.Container runtime: It seems like this is a place where we have to make a choice:
I notice on WMF techwiki: “docker is mean to be used as the CRE. Other runtime engines aren’t currently supported.” which is compatible with the statement “This guide currently covers kubernetes 1.23” and the way the deb packages are created. (the Kubernetes Infrastructure upgrade policy differs, it likely only needs an update)
This makes sense because Docker as a runtime stopped being supported by Kubernetes in v1.24! Jumping that minor version may be tricky.
Given we will start differing from WMF’s documentation, I will pick CRI-O for no particular reason and target the latest version of Kubernetes (1.28.3).
These provide cluster-level features. At first I found this confusing, but after giving it some thought: it makes sense.
This means that there can be multiple implementations (and it happens) of similar cluster services, addressing different needs and priorities.
These are the ones I think we’ll need:
With that, we have most information to start planning and allocating resources.
Given the choice to run etcd
externally, we will need two separate networks:
etcd
: access will be network and firewall-controlledetcd
too and won’t run user containersetcd
For a total of: 2 networks, where one class of machines can access both.
This is a learning environment, so we’ll NAT these IPv4 networks and assign each segment a
/64
IPv6 subnet without blocking outgoing traffic.Probably in production we wouldn’t allow that directly.
etcd
: following WMF’s etcd recommendations for etcd we will use 3 VMs dedicated to etcdKubernetes:
For a total of: 5 VMs
Aaand, this is why Kubernetes felt like overhead for some time.
Until the cost (effort, time, energy, money) of setting up and maintaining all this is not somewhat offset by the benefits, it is the wrong tool for our scale.
Now that we are planning ahead though, these are next steps:
etcd
cluster