By Hiroshi Muraoka (@tapih)
This article introduces Cybozu Kubernetes Engine(CKE), a certificated Kubernetes Conformance Software by CNCF.
CKE is designed to reduce the operations burden for large Kubernetes(k8s) clusters as little as possible. It can automatically install and do an in-place upgrade of Kubernetes clusters.
Contents:
- Features
- Declarative Configuration
- HA k8s control plane
- In-place and fast upgrade of k8s
- Default deployment of CoreDNS and node-local DNS cache server
- Integration with our in-house server management tool
- Summary
Features
The key features of CKE are as follows:
- Declarative Configuration
- Highly-Available(HA) k8s control plane
- In-place and fast upgrade of k8s
- default deployment of CoreDNS and node-local DNS cache server
The rest of CKE's features can be referenced in this document.
Below is a brief overview of CKE. etcd is used as CKE's storage backend and shares data between multiple CKE master nodes. Vault is used to issue certificates for etcd and k8s.
Declarative Configuration
One of the important concepts which Kubernetes made standard in infrastructure management is definitely the declarative configuration. With declarative configuration, operators need not care about how to run commands to manage k8s resources but care about what is the ideal state of the cluster. All you have to do is to tell k8s what is the ideal state with YAML. Then, k8s understands what to do to realize the ideal state.
CKE is also configured declaratively with YAML. Once CKE gets a YAML configuration that contains an ideal state, CKE compares the ideal state with a current state and run necessary operations to make the cluster ideal.
Besides, operators can easily reproduce environments or track a history of configurations with VCS because all configurations of the infrastructure are kept as code.
HA k8s control plane
CKE deploys control plane components, such as kube-apiserver
and kube-scheduler
, onto multiple nodes to make a cluster HA.
However, k8s components that connect to an API server can contain only one address for the API server. To make use of multiple API servers with one address, we developed a simple TCP reverse proxy, named rivers.
Rivers proxies are deployed onto each worker node and forward incoming requests to a randomly selected API server. If the selected API server is not available, the request is forwarded to a different API server. Rivers proxies do not only make a cluster HA but also balance load of connecting API servers.
In-place and fast upgrade of k8s
After booting a k8s cluster successfully, the next challenge is how to upgrade the k8s cluster without stopping it.
For example, kubeadm upgrades clusters after evicting workloads onto different nodes by kubectl drain
,
but it is difficult to evict stateful workloads that make use of node-local storages.
On the other hand, CKE upgrades clusters in the following way without evicting workloads.
- Pull container images of k8s core components, such as
kubelet
, on nodes - Update a k8s configuration file
- Stop old k8s core components
- Run new k8s core components
Though workloads should be evicted in upgrading OS or container runtime, k8s core components, such as kubelet
, can be upgraded without stopping workloads because the workloads depend only on container runtime.
Default deployment of CoreDNS and node-local DNS cache server
In-cluster DNS server watches an API server and makes DNS records based on Service resources created. Then Pods resolve a Service name to an IP address with the DNS server. Besides, a node-local DNS cache server is generally required to avoid system failure caused by Pod replacement or temporal network shutdown or traffic explosion caused by increasing inter-pod traffics.
CKE uses CoreDNS, which is used by default in k8s version 1.13 or later, as an in-cluster DNS server. CKE uses unbound as a node-local DNS cache server. It works for both out-of-cluster and in-cluster traffics.
CoreDNS is deployed as Deployment and unbound is deployed as DaemonSet.
Pods use node-local unbound as a DNS server instead of CoreDNS.
When resolving a hostname, the unbound server checks if the hostname is inside the cluster or not (if an FQDN ends with .cluster.local
, it's inside the cluster).
If so, the unbound server communicates with CoreDNS.
Otherwise, it communicates with an out-of-cluster DNS.
Integration with our in-house server management tool
We have to think about the following points in managing a cluster.
- Pods should not be scheduled onto nodes whose hardware is broken
- Control planes should be distributed evenly in different racks
- Newer nodes should join a k8s cluster prior to retiring nodes
- The number of nodes should increase when a hardware resource is running out
It's not so difficult to consider the points if your cluster is small. However, when the cluster gets large, for example, 1000+ nodes, it becomes so hard to declare what the ideal state is with consideration for the points.
To overcome the difficulty, we develop a tool named sabakan which provisions servers and manages the servers' life cycle. CKE upgrades an ideal state based on the following data provided by sabakan.
- Network connectivity
- Hardware breakdown
- OS version
- Remaining periods to retirement
- The type of hardware
- The number of the rack where a machine is placed
If you want to know how it works in more detail, please look at this document.
Summary
In short, CKE is an out-of-box k8s engine that boots and upgrades an HA cluster in-place. Thanks to CKE, the operational costs of k8s clusters can be significantly reduced.
If you are interested in CKE, this document is a good starting point. Thank you for reading this to the end!!