Join node as master to existing cluster

Recently I got CKA certification and one thing I was constantly shuffling around and trying to achieve is simply join another node as master to already existing cluster. Join as worker is pretty simple but as master things aren't going so well...
Anyway I managed to do it just a day before the exam...Since I was searching all over for this stuff and couldn't find any I will try and explain it here for everyone who needs it, cause in practice this will be much needed. Exam for now is on v1.15.2, maybe there will be some changes in v1.16 or later so this could be achieved automatically with kubeadm, but the guide in this post is done on v1.15.5.
Basically if creating HA setup (multiple masters) from scratch it's all very easy cause kubeadm init command gives you code for joining nodes as worker and for joining nodes as master so you just copy it to respective nodes. Later you cannot get 'join as master' part so you need to create all the stuff you need manually. In this guide we will not go into creating HA kubernetes cluster but we will skip to removing master node and replacing it with new node. [1] in this footnote you can find info on HA setup but in v1.16 you can pass --control-plane to kubeadm so it's a little easier then we will do here. Just keep that in mind.
Current node looks like this:


kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP    
master1   Ready    master   39d   v1.15.5   172.16.0.106  
master2   Ready    master   39d   v1.15.5   172.16.0.126
master3   Ready    master   39d   v1.15.5   172.16.0.51
node1     Ready    < none >   39d   v1.15.2   172.16.0.72
node2     Ready    < none >   39d   v1.15.2   172.16.0.111 

Don't mind that worker nodes are on v1.15.2, I just haven't finished updating them, but for this post it doesn't matter anyway. Now we will power-off master2 node and remove it from cluster:


kubectl delete node master2
node "master2" deleted

Check cluster to confirm:


kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE   VERSION   INTERNAL-IP    
master1   Ready    master   39d   v1.15.5   172.16.0.106  
master3   Ready    master   39d   v1.15.5   172.16.0.51
node1     Ready    < none >   39d   v1.15.2   172.16.0.72
node2     Ready    < none >   39d   v1.15.2   172.16.0.111 

Now let's say that master2 is irreparable and you must setup new one. Spin up the new node and install kubernetes and everything you need before joining it to your cluster. But to have everything look nice you wanna join it with a name master2 but let's say different IP for whatever reason. In short you have a failed server/vm and you wanna replace it with new one but have same name and/or IP. So lets first create token and new SSL hash on one of the remaining master nodes so we can join new node as master.


sudo kubeadm token create
ac6dta.8j7ppsrtgdh7ihgv
sudo openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
834c9ab40ba2ec774369d98929f57397544433e3a1cba7be3232376a3d9c5c67

Adjust your ca.crt path if needed but if you installed on default this is where it should be. Now create a new master certificate to join as a master instead of as a worker.


sudo kubeadm init phase upload-certs --upload-certs
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
1bb0d951a3d16d6e8c33fbaa91816c44c9b3815ef1e643ae1559373e9556625c

So now we have everything to create command to join the new node to our existing cluster. Before running join command don't forget to change IP address of your node name, in this case master2 on haproxy (HA setup) and other masters if using hosts file or directly on DNS. Query for IP address of master2 will return old IP address if not changed. Now log on to that new node and run command:


sudo kubeadm join k8smaster:6443 --token ac6dta.8j7ppsrtgdh7ihgv --discovery-token-ca-cert-hash sha256:834c9ab40ba2ec774369d98929f57397544433e3a1cba7be3232376a3d9c5c67 --control-plane --certificate-key 1bb0d951a3d16d6e8c33fbaa91816c44c9b3815ef1e643ae1559373e9556625c
...
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: context deadline exceeded

So it seems it doesn't work. When you remove node from cluster as we did with kubectl delete node master2 it doesn't remove that node from ETCD cluster. Because we wanna use the same name it complains that it cannot reach that "old" node. What we need to do is manually remove that node from ETCD cluster. This is fairly easy. Log into one of existing master nodes, master1 or master3 and use etcdctl tool [2]. First we check member list to get ID of the node we need to remove:


sudo ETCDCTL_API=3 etcdctl member list --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key 
32406de4a3bf4c3a, started, master2, https://172.16.0.126:2380, https://172.16.0.126:2379
359d36595f88a341, started, master3, https://172.16.0.51:2380, https://172.16.0.51:2379
4fb10e9c94e60d2e, started, master1, https://172.16.0.106:2380, https://172.16.0.106:2379

So the ID of master2 is 32406de4a3bf4c3a. Now we remove it with next command:


sudo ETCDCTL_API=3 etcdctl member remove 32406de4a3bf4c3a --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key 
Member 32406de4a3bf4c3a removed from cluster 2712aac686bf14ed

So now when "old" node is removed we can repeat join command on new master2 node. First we need to clear changes made with previous failed command using kubeadm reset command and then repeat join command:


sudo kubeadm reset
...
[reset] Are you sure you want to proceed? [y/N]: y
...
sudo kubeadm join k8smaster:6443 --token ac6dta.8j7ppsrtgdh7ihgv --discovery-token-ca-cert-hash sha256:834c9ab40ba2ec774369d98929f57397544433e3a1cba7be3232376a3d9c5c67 --control-plane --certificate-key 1bb0d951a3d16d6e8c33fbaa91816c44c9b3815ef1e643ae1559373e9556625c
...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node master2 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node master2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]

This node has joined the cluster and a new control plane instance was created:
...

This time it succeeded as we expect. Now we have whole cluster back again and with same node names.


kubectl get nodes -o wide
NAME      STATUS   ROLES    AGE    VERSION   INTERNAL-IP
master1   Ready    master   39d    v1.15.5   172.16.0.106
master2   Ready    master   7m9s   v1.15.5   172.16.0.103 
master3   Ready    master   39d    v1.15.5   172.16.0.51
node1     Ready    < none >   39d   v1.15.2   172.16.0.72
node2     Ready    < none >   39d   v1.15.2   172.16.0.111

That's it. We killed one master node, created new one and joined it to the existing cluster with the same name as the one we removed.


  1. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ ↩︎

  2. https://kubernetes.io/docs/tasks/administer-cluster/configure-upgrade-etcd/#replacing-a-failed-etcd-member ↩︎