Kintone Engineering Blog

Learn about Kintone's engineering efforts. Kintone is provided by Cybozu Inc., a Tokyo-based public company founded in 1997.

Production-grade delivery workflow using Argo CD

By Banji Inoue (@binoue), Akihiro Ikezoe(@zoetro)

Nowadays, GitOps is widely considered the best methodology for continuous delivery. However, the right way of implementing GitOps for production environments is not widely understood.

We briefly introduce some GitOps best practices and then explain how to implement them using Argo CD. Topics include self-management of Argo CD, off-the-shelf configurations, and (soft) multi-tenancy.

Contents:

What is Argo CD

Spinnaker and Jenkins X are well known for continuous delivery tools for Kubernetes. These tools are for managing the whole pipeline of the continuous delivery.

In contrast, Argo CD does not manage the pipeline but rather works as one of the components within it. Therefore this kind of tool is also called a continuous delivery component.

This March the Continuous Delivery Foundation was founded with projects such as Spinnaker or Jenkins as targets.

The company Intuite which is at the core of Argo CD development, has started working with Weave Works, developers of the competitive product Flux, to develop a continuous delivery tool for GitOps known as Argo Flux. We are in an exciting time for such continuous delivery tools.

Now, let us start explaining Argo CD.

Argo CD continuous deployments
Introducing Argo CD — Declarative Continuous Delivery for Kubernetes

As shown in the picture above, when a developer pushes application code to their git repository, the CI builds the code and a container image is built and registered to the container registry. Then, they push the manifests and Argo CD applies them to a Kubernetes cluster.

This way of deploying is called GitOps.

Best practices

Recently the article titled 5 GitOps Practices was published on the Argo CD blog. In said article, you will find the following 5 items for the best practices of GitOps.

  1. Two Repos: One For App Source Code, Another For Manifests
  2. Choose The Right Number Of Deployment Config Repos
  3. Test Your Manifests Before You Commit
  4. Git Manifests Should Not Change Due To External Changes
  5. Plan How You will Manage Secrets

In our Neco project, we found that our process follows all the practices defined by Argo CD.

Below, we explain how we followed them.

1. Two Repos: One For App Source Code, Another For Manifests

In the Neco project, manifests are managed by the neco-apps repository and the source code is managed in each application repository, for example neco-containers.

Argo CD can be used with some manifest rendering tools such as Helm or Kustomize (among others). neco-apps is using Kustomize for its ability to manage the differences between environments and it is easy to use Off-the-Shelf Configuration, described below.

We use Git branches to represent each environment.

Manifests in master branches are tested nightly and merged to the stage branch. When we confirm the manifests are working in the staging environment for a while, we merge the stage branch to release branch manually.

2. Choose The Right Number Of Deployment Config Repos

In Cybozu, we have two kinds of teams

  1. the Neco project team, managing the Kubernetes cluster construction and operation
  2. the application team, developing applications that operate on the Kubernetes cluster

Each team has repositories they use to manage their manifests. Also, Argo CD has Projects which are logical groupings of Applications and their deployment configuration. The Projects can be configured to limit usable repositories, Kubernetes clusters for deployment, and target namespaces. We are using this function to limit the namespaces to which developer teams can deploy their apps.

As mentioned above, separating repositories of manifests and using Projects lets each team deploy freely without affecting other teams.

3. Test Your Manifests Before You Commit

Neco project has three test stages for manifests:

  1. validate manifests by attempting to render the manifest layers with Kustomize
  2. test basic functionality of the manifests and the deployed software with kind
  3. perform virtualized production environment testing in a data center built on a GCP instance

In the third test, we are testing the basic functionality tested in the kind environment as well as any upgrade migrations. In our upgrade test, we construct virtualized environments with the manifests applied to the current real staging and production environments, then upgrade the manifests to the latest version and confirm the environment remains healthy.

Currently, we are detecting many failures in this upgrade test and are finding that this test is quite useful. However, this virtual data center based test takes a lot of time, therefore we only run the test before merging to the stage branch.

4. Git Manifests Should Not Change Due To External Changes

In Neco project, we do not use official container images typically found in DockerHub, but build our container images by hand for our applications and OSS licensed applications managed by the third parties. This way we can be sure of the exact version of the software in use, as well as make any modifications if required.

When we build containers, we add a specific version tag, not the latest tag. By doing this, we can be assured that previously released manifests will always specify certain container images.

5. Plan How You’ll Manage Secrets

The last practice is managing credentials properly. When it comes to GitOps, there is no final answer about how to manage credentials and there are currently multiple ways to do this.

In our case, we classified credential information into two categories:

  1. information which let attackers intrude into our data center or leak customers' information
  2. other credential information (license key, etc..)

We don't manage information falling within the first category inside git repositories. Therefore, manual operation is required for handling such information, but the number of credentials is low and the frequency of change is also low. Thus managing it manually is not much trouble for the operators.

The information falling into the second category is unencrypted and managed within private git repositories and the deployment is handled automatically by Argo CD

Beyond the best practices

Here, we would like to introduce our practices not mentioned in the 5 GitOps Practices above.

App of Apps Pattern

An Application resource is a unit in Argo CD that deploys a set of manifests. As an Application is also a Kubernetes resource, it can be managed with Argo CD

Using an Application resource to manage multiple other Application resources is called App of Apps Pattern.

We are using this pattern here.

If you are using the App of Apps Pattern to manage Application resources, increasing or decreasing Application resources can be done via adding/removing manifests to your git repository instead of operating Argo CD via the Web UI or command line.

Self Management

Argo CD is also one of the applications on Kubernetes, so Argo CD can be used to continuously deliver itself. See this.

Self Management lets us update Argo CD like any other applications managed by Argo CD.

Monitoring deploying manifests

Even if you are testing your manifests as mentioned above, you may still encounter a failure when deploying manifests to the Kubernetes cluster. Argo CD provides Prometheus style metrics allowing you to monitor Application health information or sync status of each  Application.

In our Neco project, we get notifications in Slack when Argo CD is down for a certain period and when Application sync is completed.

Off-the-Shelf Configuration

neco-apps contains not only our original-made manifests but many official manifests with some custom version. These OSS manifests are distributed in various ways like Helm Chart, embedded to documents, etc..

In many cases, distributed manifests need to be modified and added to our repositories. Like above, taking and using manifests created by 3rd parties is called Off-the-Shelf Configuration.

Through our operation, we noticed that following the manifests in our repository back to the upstream manifests required a lot of manual effort. It is very painful to check the updates of upstream manifests and integrate these changes to our manifests while taking care to maintain other changes we had made.

We can avoid this problem with Kustomize, as it lets us patch our changes to the existing manifests. This way we don't have to modify distributed manifests with OSS license and copy to our repository. Our changes can be applied automatically from their manifest layer in many cases.

For example, here

This way will allow you to update manifests by just adding the updated upstream manifests to our repository as the upstream manifests are updated.

Multi-tenancy

To let tenants use Argo CD and maintain a high enough security level, we recommend App of Apps Pattern for our tenants.

In this case, we have an Application which deploys our tenant's Applications that deploy their resources. Therefore our deploy processes go from step 1 to step 3 shown in the picture below.

f:id:cybozuinsideout:20200131164556p:plain
app-of-apps for tenants

After separating our tenants' Applications and manifests, in order to keep our security level high, we added two new Projects in addition to default Project for the tenant shown in the picture below.

f:id:cybozuinsideout:20200206150354p:plain
tenant projects

The first Project is for our managing Applications for tenants and prevents from creating Kubernetes resources in the argocd namespace, as the argocd namespace is for Argo CD and should not be available for tenants use. The second Project is for Applications which tenants create and deploy and prevents them from manipulating cluster-wide resources and deploying manifests to the namespaces which they are not managing.

However, we encountered two problems;

  1. As Argo CD is unable to force Applications to create Applications with a certain Project, tenants were able to use our default Project which would grant them full privileges.
  2. Projects do not currently have whitelists for namespace-wide resources, it is difficult to let tenant Project in the figure above create only Application resources.

To solve the two issues, we added a validation webhook for Application resources which validates that each Application is using the proper Project based on our defined rules. You can see that here. Also, we only restrict creating some important Kubernetes resources like Pods or DaemonSets etc.. for the Application responsible for deploying tenant Applications.

We would like to seek better ways of enabling multi-tenancy with Argo CD and also contribute it if possible.

Summary

In this article, we introduced our Argo CD based GitOps system in neco. We covered the top 5 best practices for GitOps, along with some valuable lessons we found along the way. For example, using applications to manage the deployment lifecycle of other applications. Thank you for reading, we hope you've been able to learn something from our experience and avoid some of the problems we faced.

If you'd like to find out more, please come and check out our neco-apps repository on GitHub!