Kintone Engineering Blog

Learn about Kintone's engineering efforts. Kintone is provided by Cybozu Inc., a Tokyo-based public company founded in 1997.

Tolerating failures in container image registries

By Akihiro Ikezoe, Hirotaka Yamamoto

Have you ever experienced that your Kubernetes cluster cannot create new Pods because Docker Hub or other container registries are down?

This article will show you several ways to ensure your Kubernetes clusters can always pull images even while an upstream registry is failing. We also describe why we chose to self-host container image registry as a pull-through cache for an upstream registry.

Benefits of having a local registry

Many container images are hosted on public image registries including Docker Hub, quay.io, or GitHub Container Registry (GHCR).

If you have a Deployment that depends on these public registries, you might not be able to create new Pods while the registry is failing. This would cause services running on your Kubernetes cluster to be disrupted.

A local, self-hosted registry can resolve this problem. If you have the same image in the local registry as in the public registry, then when the public registry is down, you can fetch the image from the local registry instead.

Having a local registry has more benefits including:

  • No rate limits, unlike Docker Hub.
  • Faster fetch time
  • Less bandwidth usage

Software for self-hosting registries

Registry

github.com/distribution/distribution includes a simple image registry software called registry.

github.com

Harbor

Harbor is a feature rich registry software.

goharbor.io

Dragonfly

Dragonfly is an open-source P2P-based Image and file distribution system.

d7y.io

How to switch registries transparently

A Pod in Kubernetes specifies container images with a registry as follows:

spec:
  containers:
  - name: ubuntu
    image: quay.io/cybozu/ubuntu:20.04

In the above example, quay.io is the registry. If registry is omitted, docker.io is normally used.

Suppose that the registry is down and we need to switch the registry from quay.io to something like local.registry.svc. You can, of course, edit Deployment or StatefulSet to point to the local registry. That said, this can be cumbersome when you have a lot of Deployments or StatefulSets.

There are three ways to avoid editing manifests manually, that is:

  • Registry mirror & pull-through cache registry
  • Man-in-the-middle proxy
  • Mutating admission webhook

Registry mirror & pull-through cache registry

Most container runtimes, including containerd and cri-o, can configure registry mirrors.

The configuration below is an example for containerd to configure local.registry.svc as a mirror of quay.io.

[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors."quay.io"]
    endpoint = ["http://local.registry.svc", "https://quay.io"]

With this configuration, containerd will look local.registry.svc first when it fetches an image on quay.io.

containerd can have multiple mirror registries

If local.registry.svc is down, containerd falls back to quay.io.

falls back to the second mirror registry

Now, we have multiple registries available for an image. The remaining problem is how to pre-load the image into the local mirror registry.

Instead of manually pushing the necessary images one by one to the local registry, you may use "pull-through cache" feature. Pull-through cache is available in both distribution's registry and Harbor.

With this feature enabled, the local registry can transparently fetch the requested image from the upstream registry and cache it. For subsequent requests, the local registry will be able to return the cached image.

transparently fetch from the upstream registry

Man-in-the-middle proxy

This method is to use a local registry as a man-in-the-middle proxy for the upstream registry.

The container runtime such as containerd tries to fetch an image from the upstream registry through an HTTPS proxy server. The HTTPS proxy server intercepts the communication and caches the requested image. To do so, a root CA certificate needs to be injected to the container runtime.

rpardini/docker-registry-proxy is one of such an implementation.

One apparent drawback to this approach is that the man-in-the-middle proxy could be another single-point-of-failure.

In case that your Kubernetes clusters' container runtime is Docker, this method may be appropriate since you cannot set up mirrors except for Docker Hub.

Mutating admission webhook

This method is to dynamically mutate the container image registry by a Mutating Admission Webhook when a Pod is created.

docker-proxy-webhook is one of such an implementation.

It remains a problem how to fall back to the upstream registry when the local registry goes down.

Limitations and caveats

There are some limitations or caveats on implementing a pull-through cache registry as a mirror of an upstream registry.

Docker can have mirrors only for Docker Hub

Although it has been discussed for years to extend support, Docker cannot have mirrors other than for Docker Hub. Note that this is a problem of Docker, not a registry or containerd problem.

In addition, it is recommend to use a container runtime other than Docker because dockershim has been deprecated since Kubernetes 1.20.

Harbor only supports proxy caching for Docker Hub and Harbor registries

You can find the description at https://goharbor.io/docs/2.1.0/administration/configure-proxy-cache/ .

Harbor only supports proxy caching for Docker Hub and Harbor registries.

registry, on the other hand, can work with Docker Hub, quay.io, and GitHub Container Registry.

Only one upstream per pull-through cache registry

An instance of registry or Harbor can have only one upstream registry for pull-through cache.

If you have multiple upstream registries, you need to prepare as many local registries as the upstream registries.

imagePullSecrets are not passed to pull through cache registries

To access private image repositories, Kubernetes Pods can specify imagePullSecrets that stores a credential of the repository.

kubernetes.io

Container runtimes, including Docker, containerd, and cri-o, do not pass imagePullSecrets to local mirrors. Instead, a local pull-through cache registry needs to be configured to have a credential to its upstream registry to fetch private images.

Note that the cached private images will be available for everyone that can access the local registry.

Note also that both registry and Harbor can have only one credential for the upstream registry. For this reason, the single credential needs to be allowed to access any private image on the upstream registry.

Our implementation

Our Kubernetes clusters use containerd as the container runtime and use images on quay.io and Github Container Registry.

Given these conditions and the aforementioned limitations, we chose to deploy registry as a pull-through cache and configure containerd to use registry as a mirror.

If you are interested, you can check out our manifests and configurations:

Summary

We discuss pros and cons of three methods for how to tolerate failures in container image registries.

  1. Configure the container runtime to use a pull-through cache registry as a mirror of the upstream registry
  2. Deploy a man-in-the-middle proxy and use it as a pull-through cache
  3. Use a mutating admission webhook to dynamically edit Pods to reference local mirrors

It depends on your environment which method is best for you. We hope this article will help guide you to find your way.