Automating Lifecycle Management of Kubernetes Clusters

By Hiroshi Muraoka (@tapih)

This article introduces Cybozu Kubernetes Engine(CKE), a certificated Kubernetes Conformance Software by CNCF.

CKE is designed to reduce the operations burden for large Kubernetes(k8s) clusters as little as possible. It can automatically install and do an in-place upgrade of Kubernetes clusters.

Contents:

Features
Declarative Configuration
HA k8s control plane
In-place and fast upgrade of k8s
Default deployment of CoreDNS and node-local DNS cache server
Integration with our in-house server management tool
Summary

Features

The key features of CKE are as follows:

Declarative Configuration
Highly-Available(HA) k8s control plane
In-place and fast upgrade of k8s
default deployment of CoreDNS and node-local DNS cache server

The rest of CKE's features can be referenced in this document.

Below is a brief overview of CKE. etcd is used as CKE's storage backend and shares data between multiple CKE master nodes. Vault is used to issue certificates for etcd and k8s.

Declarative Configuration

One of the important concepts which Kubernetes made standard in infrastructure management is definitely the declarative configuration. With declarative configuration, operators need not care about how to run commands to manage k8s resources but care about what is the ideal state of the cluster. All you have to do is to tell k8s what is the ideal state with YAML. Then, k8s understands what to do to realize the ideal state.

CKE is also configured declaratively with YAML. Once CKE gets a YAML configuration that contains an ideal state, CKE compares the ideal state with a current state and run necessary operations to make the cluster ideal.

Besides, operators can easily reproduce environments or track a history of configurations with VCS because all configurations of the infrastructure are kept as code.

HA k8s control plane

CKE deploys control plane components, such as kube-apiserver and kube-scheduler, onto multiple nodes to make a cluster HA.

However, k8s components that connect to an API server can contain only one address for the API server. To make use of multiple API servers with one address, we developed a simple TCP reverse proxy, named rivers.

Rivers proxies are deployed onto each worker node and forward incoming requests to a randomly selected API server. If the selected API server is not available, the request is forwarded to a different API server. Rivers proxies do not only make a cluster HA but also balance load of connecting API servers.

In-place and fast upgrade of k8s

After booting a k8s cluster successfully, the next challenge is how to upgrade the k8s cluster without stopping it. For example, kubeadm upgrades clusters after evicting workloads onto different nodes by kubectl drain, but it is difficult to evict stateful workloads that make use of node-local storages.

On the other hand, CKE upgrades clusters in the following way without evicting workloads.

Pull container images of k8s core components, such as kubelet, on nodes
Update a k8s configuration file
Stop old k8s core components
Run new k8s core components

Though workloads should be evicted in upgrading OS or container runtime, k8s core components, such as kubelet, can be upgraded without stopping workloads because the workloads depend only on container runtime.

Default deployment of CoreDNS and node-local DNS cache server

In-cluster DNS server watches an API server and makes DNS records based on Service resources created. Then Pods resolve a Service name to an IP address with the DNS server. Besides, a node-local DNS cache server is generally required to avoid system failure caused by Pod replacement or temporal network shutdown or traffic explosion caused by increasing inter-pod traffics.

CKE uses CoreDNS, which is used by default in k8s version 1.13 or later, as an in-cluster DNS server. CKE uses unbound as a node-local DNS cache server. It works for both out-of-cluster and in-cluster traffics.

CoreDNS is deployed as Deployment and unbound is deployed as DaemonSet. Pods use node-local unbound as a DNS server instead of CoreDNS. When resolving a hostname, the unbound server checks if the hostname is inside the cluster or not (if an FQDN ends with .cluster.local, it's inside the cluster). If so, the unbound server communicates with CoreDNS. Otherwise, it communicates with an out-of-cluster DNS.

Integration with our in-house server management tool

We have to think about the following points in managing a cluster.

Pods should not be scheduled onto nodes whose hardware is broken
Control planes should be distributed evenly in different racks
Newer nodes should join a k8s cluster prior to retiring nodes
The number of nodes should increase when a hardware resource is running out

It's not so difficult to consider the points if your cluster is small. However, when the cluster gets large, for example, 1000+ nodes, it becomes so hard to declare what the ideal state is with consideration for the points.

To overcome the difficulty, we develop a tool named sabakan which provisions servers and manages the servers' life cycle. CKE upgrades an ideal state based on the following data provided by sabakan.

Network connectivity
Hardware breakdown
OS version
Remaining periods to retirement
The type of hardware
The number of the rack where a machine is placed

If you want to know how it works in more detail, please look at this document.

Summary

In short, CKE is an out-of-box k8s engine that boots and upgrades an HA cluster in-place. Thanks to CKE, the operational costs of k8s clusters can be significantly reduced.

If you are interested in CKE, this document is a good starting point. Thank you for reading this to the end!!

Kintone Engineering Blog

Learn about Kintone's engineering efforts. Kintone is provided by Cybozu Inc., a Tokyo-based public company founded in 1997.