Engineer in Tokyo

How kubeadm Initializes Your Kubernetes Master

kubeadm is a new tool that is part of the Kubernetes distribution as of 1.4.0 which helps you to install and set up a Kubernetes cluster. One of the most frequent criticisms of Kubernetes is that it’s hard to install. kubeadm really makes this easier so I suggest you give it a try.

The documentation for kubeadm outlines how to set up a cluster but as I was doing that I found how kubeadm actually sets up the master to be really interesting so I wanted to share that here.

The Kubernetes Control Plane

The Kubernetes control plane consists of the Kubernetes API server (kube-apiserver), controller manager (kube-controller-manager), and scheduler (kube-scheduler). The API server depends on etcd so an etcd cluster is also required.

These components need to be installed on your master and can be installed in a number of ways. But there are a number of things you have to think about, like how do you make sure each of them are always running? How do you update the components easily with as little impact to the system as possible? You could install them directly on the host machine by downloading them and running them but if they crash then you’d have to restart them manually.

One way to make sure the control plane components are always running is to use systemd. This will make sure they are always running but won’t make it easy to upgrade the components later.

kubeadm and the kubelet

Fortunately, Kubernetes has a component called the Kubelet which manages containers running on a single host. It uses the API server but it doesn’t depend on it so we can actually use the Kubelet to manage the control plane components. This is exactly what kubeadm sets us up to do. Let’s look at what happens when we run kubeadm.

# kubeadm init
<master/tokens> generated token: "d97591.135ba38594a02df1"
<master/pki> created keys and certificates in "/etc/kubernetes/pki"
<util/kubeconfig> created "/etc/kubernetes/kubelet.conf"
<util/kubeconfig> created "/etc/kubernetes/admin.conf"
<master/apiclient> created API client configuration
<master/apiclient> created API client, waiting for the control plane to become ready
<master/apiclient> all control plane components are healthy after 21.451883 seconds
<master/apiclient> waiting for at least one node to register and become ready
<master/apiclient> first node is ready after 0.503915 seconds
<master/discovery> created essential addon: kube-discovery, waiting for it to become ready
<master/discovery> kube-discovery is ready after 17.003674 seconds
<master/addons> created essential addon: kube-proxy
<master/addons> created essential addon: kube-dns

Kubernetes master initialised successfully!

You can now join any number of machines by running the following on each node:

kubeadm join --token d97591.135ba38594a02df1 10.240.0.2

We can see that kubeadm created the necessary certificates for the API, started the control plane components, and installed the essential addons. kubeadm doesn’t mention anything about the Kubelet but we can verify that it’s running:

# ps aux | grep /usr/bin/kubelet | grep -v grep
root      4147  4.4  2.1 473372 82456 ?        Ssl  05:18   1:08 /usr/bin/kubelet --kubeconfig=/etc/kubernetes/kubelet.conf --require-kubeconfig=true --pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --network-plugin=cni --cni-conf-dir=/etc/cni/net.d --cni-bin-dir=/opt/cni/bin --cluster-dns=100.64.0.10 --cluster-domain=cluster.local --v=4

So our Kubelet was started. But how? The Kubelet will monitor the control plane components but what monitors Kubelet and make sure it’s always running? This is where we use systemd. Systemd is started as PID 1 so the OS will make sure it is always running, systemd makes sure the Kubelet is running, and the Kubelet makes sure our containers with the control plane components are running.

We have a process architecture something like the following. It’s important to note that this is not a diagram of the process tree but rather a diagram showing which components start and monitor each other.

So now we have our Kubelet running our control plane components and it is connected to the API server just like any other Kubelet node. We can verify that:

# kubectl get nodes
NAME         STATUS    AGE
k8s-master   Ready     2h

One thing about the Kubelet running on the master is different though. There is a special annotation on our node telling Kubernetes not to schedule containers on our master node.

# kubectl get nodes -o json | jq '.items[] | select(.metadata.name=="k8s-master") | .metadata.annotations'
{
  "scheduler.alpha.kubernetes.io/taints": "[{\"key\":\"dedicated\",\"value\":\"master\",\"effect\":\"NoSchedule\"}]",
  "volumes.kubernetes.io/controller-managed-attach-detach": "true"
}

The interesting bits are the scheduler.alpha.kubernetes.io/taints key which contains information for the scheduler saying that it shouldn’t schedule containers on this node. This allows us to view and manage our node through the Kubernetes API but not worry about scheduling regular containers on the master node.

Verifying the Control Plane Components

We can see that kubeadm created a /etc/kubernetes/ directory so let’s check out what’s there.

# ls -lh /etc/kubernetes/
total 32K
-rw------- 1 root root 9.0K Oct 12 05:18 admin.conf
-rw------- 1 root root 9.0K Oct 12 05:18 kubelet.conf
drwxr-xr-x 2 root root 4.0K Oct 12 05:18 manifests
drwx------ 2 root root 4.0K Oct 12 05:18 pki

The admin.conf and kubelet.conf are yaml files that mostly contain certs used for authentication with the API. The pki directory contains the certificate authority certs, API server certs, and tokens.

# ls -lh /etc/kubernetes/pki
total 36K
-rw------- 1 root root 1.7K Oct 12 05:18 apiserver-key.pem
-rw-r--r-- 1 root root 1.2K Oct 12 05:18 apiserver.pem
-rw------- 1 root root  451 Oct 12 05:18 apiserver-pub.pem
-rw------- 1 root root 1.7K Oct 12 05:18 ca-key.pem
-rw-r--r-- 1 root root 1.1K Oct 12 05:18 ca.pem
-rw------- 1 root root  451 Oct 12 05:18 ca-pub.pem
-rw------- 1 root root 1.7K Oct 12 05:18 sa-key.pem
-rw------- 1 root root  451 Oct 12 05:18 sa-pub.pem
-rw------- 1 root root   96 Oct 12 05:18 tokens.csv

The manifests directory is where things get interesting. In the manifests directory we have a number of json files for our control plane components.

# ls -lh /etc/kubernetes/manifests/
total 16K
-rw------- 1 root root 1.8K Oct 12 05:18 etcd.json
-rw------- 1 root root 2.1K Oct 12 05:18 kube-apiserver.json
-rw------- 1 root root 1.7K Oct 12 05:18 kube-controller-manager.json
-rw------- 1 root root  970 Oct 12 05:18 kube-scheduler.json

If you noticed earlier the Kubelet was passed the --pod-manifest-path=/etc/kubernetes/manifests flag which tells it to monitor the files in the /etc/kubernetes/manifests directory and makes sure the components defined therein are always running. We can see that they are running my checking with the local Docker to list the running containers.

# docker ps --format="table {{.ID}}\t{{.Image}}"
CONTAINER ID        IMAGE
dbaf645c0dd4        gcr.io/google_containers/pause-amd64:3.0
186feb8bbb73        gcr.io/google_containers/kube-proxy-amd64:v1.4.0
38644bc269cc        gcr.io/google_containers/pause-amd64:3.0
142dfe6fcba2        gcr.io/google_containers/kube-discovery-amd64:1.0
1f3969d0d773        gcr.io/google_containers/pause-amd64:3.0
bb9e153bcb84        gcr.io/google_containers/kube-controller-manager-amd64:v1.4.0
c37d54f86ab9        gcr.io/google_containers/kube-apiserver-amd64:v1.4.0
a42224e47f84        gcr.io/google_containers/etcd-amd64:2.2.5
d9d109fc62de        gcr.io/google_containers/kube-scheduler-amd64:v1.4.0
a28445c759be        gcr.io/google_containers/pause-amd64:3.0
72f9565d39fb        gcr.io/google_containers/pause-amd64:3.0
33c70feee8ee        gcr.io/google_containers/pause-amd64:3.0
f5383068a4c5        gcr.io/google_containers/pause-amd64:3.0

Several other containers are running but if we ignore them we can see that etcd, kube-apiserver, kube-controller-manager, and kube-scheduler are running.

How are we able to connect to the containers? If we look at each of the json files in the /etc/kubernetes/manifests directory we can see that they each use the hostNetwork: true option which allows the applications to bind to ports on the host just as if they were running outside of a container.

{
  "kind": "Pod",
  "apiVersion": "v1",
  ...
  "spec": {
    "containers": [
      {
        "name": "kube-apiserver",
        "image": "gcr.io/google_containers/kube-apiserver-amd64:v1.4.0",
        "volumeMounts": [
          {
            "name": "certs",
            "mountPath": "/etc/ssl/certs"
          },
          {
            "name": "pki",
            "readOnly": true,
            "mountPath": "/etc/kubernetes/"
          }
        ],
        ...
      }
    ],
    "hostNetwork" true,
    ...
  }
}

So we can connect to the API server’s insecure local port.

# curl http://127.0.0.1:8080/version
{
  "major": "1",
  "minor": "4",
  "gitVersion": "v1.4.0",
  "gitCommit": "a16c0a7f71a6f93c7e0f222d961f4675cd97a46b",
  "gitTreeState": "clean",
  "buildDate": "2016-09-26T18:10:32Z",
  "goVersion": "go1.6.3",
  "compiler": "gc",
  "platform": "linux/amd64"
}

The API server also binds a secure port 443 which requires a client cert and authentication. Be careful to use the public IP for your master here.

# curl --cacert /etc/kubernetes/pki/ca.pem https://10.240.0.2/version
Unauthorized

What kubeadm Didn’t Do

kubeadm is a really great tool for setting up a Kubernetes cluster but there are a few things it doesn’t do which will be the topic of future work. These are spelled out in the Limitations section of the documentation.

One of the most important things that kubeadm doesn’t do yet is set up a multi-node etcd cluster. In order to make your cluster more resistant to failure, you will need to create other etcd nodes. This should be easy enough but it’s important to note.

kubeadm also doesn’t yet install cloud provider integrations so things like load balancers and persistent volumes won’t work.

Hopefully that gave you an idea of what kubeadm is doing. Running the components using the Kubelet is a great practice that ensures the components are running and makes it easier to upgrade them later. When Kubernetes 1.5 comes out we can expect to see relatively painless upgrades for users of kubeadm.