By Banji Inoue (@binoue), Akihiro Ikezoe(@zoetro)
Nowadays, GitOps is widely considered the best methodology for continuous delivery. However, the right way of implementing GitOps for production environments is not widely understood.
We briefly introduce some GitOps best practices and then explain how to implement them using Argo CD. Topics include self-management of Argo CD, off-the-shelf configurations, and (soft) multi-tenancy.
Contents:
What is Argo CD
Spinnaker and Jenkins X are well known for continuous delivery tools for Kubernetes. These tools are for managing the whole pipeline of the continuous delivery.
In contrast, Argo CD does not manage the pipeline but rather works as one of the components within it. Therefore this kind of tool is also called a continuous delivery component.
This March the Continuous Delivery Foundation was founded with projects such as Spinnaker
or Jenkins
as targets.
The company Intuite which is at the core of Argo CD development, has started working with Weave Works, developers of the competitive product Flux, to develop a continuous delivery tool for GitOps known as Argo Flux. We are in an exciting time for such continuous delivery tools.
Now, let us start explaining Argo CD.
As shown in the picture above, when a developer pushes application code to their git repository, the CI builds the code and a container image is built and registered to the container registry. Then, they push the manifests and Argo CD applies them to a Kubernetes cluster.
This way of deploying is called GitOps.
Best practices
Recently the article titled 5 GitOps Practices was published on the Argo CD blog. In said article, you will find the following 5 items for the best practices of GitOps.
- Two Repos: One For App Source Code, Another For Manifests
- Choose The Right Number Of Deployment Config Repos
- Test Your Manifests Before You Commit
- Git Manifests Should Not Change Due To External Changes
- Plan How You will Manage Secrets
In our Neco project, we found that our process follows all the practices defined by Argo CD.
Below, we explain how we followed them.
1. Two Repos: One For App Source Code, Another For Manifests
In the Neco project, manifests are managed by the neco-apps repository and the source code is managed in each application repository, for example neco-containers.
Argo CD can be used with some manifest rendering tools such as Helm
or Kustomize
(among others).
neco-apps
is using Kustomize
for its ability to manage the differences between environments and it is easy to use Off-the-Shelf Configuration, described below.
We use Git branches to represent each environment.
Manifests in master
branches are tested nightly and merged to the stage
branch.
When we confirm the manifests are working in the staging environment for a while, we merge the stage
branch to release
branch manually.
2. Choose The Right Number Of Deployment Config Repos
In Cybozu, we have two kinds of teams
- the Neco project team, managing the Kubernetes cluster construction and operation
- the application team, developing applications that operate on the Kubernetes cluster
Each team has repositories they use to manage their manifests.
Also, Argo CD has Projects which are logical groupings of Application
s and their deployment configuration.
The Projects
can be configured to limit usable repositories, Kubernetes clusters for deployment, and target namespaces.
We are using this function to limit the namespaces to which developer teams can deploy their apps.
As mentioned above, separating repositories of manifests and using Projects
lets each team deploy freely without affecting other teams.
3. Test Your Manifests Before You Commit
Neco project has three test stages for manifests:
- validate manifests by attempting to render the manifest layers with
Kustomize
- test basic functionality of the manifests and the deployed software with kind
- perform virtualized production environment testing in a data center built on a
GCP
instance
In the third test, we are testing the basic functionality tested in the kind
environment as well as any upgrade migrations.
In our upgrade test, we construct virtualized environments with the manifests applied to the current real staging and production environments, then upgrade the manifests to the latest version and confirm the environment remains healthy.
Currently, we are detecting many failures in this upgrade test and are finding that this test is quite useful.
However, this virtual data center based test takes a lot of time, therefore we only run the test before merging to the stage
branch.
4. Git Manifests Should Not Change Due To External Changes
In Neco project, we do not use official container images typically found in DockerHub, but build our container images by hand for our applications and OSS licensed applications managed by the third parties. This way we can be sure of the exact version of the software in use, as well as make any modifications if required.
When we build containers, we add a specific version tag, not the latest tag. By doing this, we can be assured that previously released manifests will always specify certain container images.
5. Plan How You’ll Manage Secrets
The last practice is managing credentials properly. When it comes to GitOps, there is no final answer about how to manage credentials and there are currently multiple ways to do this.
In our case, we classified credential information into two categories:
- information which let attackers intrude into our data center or leak customers' information
- other credential information (license key, etc..)
We don't manage information falling within the first category inside git repositories. Therefore, manual operation is required for handling such information, but the number of credentials is low and the frequency of change is also low. Thus managing it manually is not much trouble for the operators.
The information falling into the second category is unencrypted and managed within private git repositories and the deployment is handled automatically by Argo CD
Beyond the best practices
Here, we would like to introduce our practices not mentioned in the 5 GitOps Practices
above.
App of Apps Pattern
An Application
resource is a unit in Argo CD that deploys a set of manifests.
As an Application
is also a Kubernetes resource, it can be managed with Argo CD
Using an Application
resource to manage multiple other Application
resources is called App of Apps Pattern.
We are using this pattern here.
If you are using the App of Apps Pattern
to manage Application
resources, increasing or decreasing Application
resources can be done via adding/removing manifests to your git repository instead of operating Argo CD via the Web UI or command line.
Self Management
Argo CD is also one of the applications on Kubernetes, so Argo CD can be used to continuously deliver itself. See this.
Self Management lets us update Argo CD like any other applications managed by Argo CD.
Monitoring deploying manifests
Even if you are testing your manifests as mentioned above, you may still encounter a failure when deploying manifests to the Kubernetes cluster.
Argo CD provides Prometheus style metrics allowing you to monitor Application
health information or sync status of each Application
.
In our Neco project, we get notifications in Slack when Argo CD is down for a certain period and when Application
sync is completed.
Off-the-Shelf Configuration
neco-apps
contains not only our original-made manifests but many official manifests with some custom version.
These OSS manifests are distributed in various ways like Helm Chart, embedded to documents, etc..
In many cases, distributed manifests need to be modified and added to our repositories. Like above, taking and using manifests created by 3rd parties is called Off-the-Shelf Configuration.
Through our operation, we noticed that following the manifests in our repository back to the upstream manifests required a lot of manual effort. It is very painful to check the updates of upstream manifests and integrate these changes to our manifests while taking care to maintain other changes we had made.
We can avoid this problem with Kustomize
, as it lets us patch our changes to the existing manifests.
This way we don't have to modify distributed manifests with OSS license and copy to our repository. Our changes can be applied automatically from their manifest layer in many cases.
For example, here
This way will allow you to update manifests by just adding the updated upstream manifests to our repository as the upstream manifests are updated.
Multi-tenancy
To let tenants use Argo CD and maintain a high enough security level, we recommend App of Apps Pattern
for our tenants.
In this case, we have an Application
which deploys our tenant's Application
s that deploy their resources. Therefore our deploy processes go from step 1 to step 3 shown in the picture below.
After separating our tenants' Application
s and manifests, in order to keep our security level high, we added two new Project
s in addition to default Project
for the tenant shown in the picture below.
The first Project
is for our managing Application
s for tenants and prevents from creating Kubernetes resources in the argocd
namespace, as the argocd
namespace is for Argo CD and should not be available for tenants use.
The second Project
is for Application
s which tenants create and deploy and prevents them from manipulating cluster-wide resources and deploying manifests to the namespaces which they are not managing.
However, we encountered two problems;
- As Argo CD is unable to force
Application
s to createApplication
s with a certainProject
, tenants were able to use our defaultProject
which would grant them full privileges. Project
s do not currently have whitelists for namespace-wide resources, it is difficult to lettenant Project
in the figure above create onlyApplication
resources.
To solve the two issues, we added a validation webhook for Application
resources which validates that each Application
is using the proper Project
based on our defined rules.
You can see that here.
Also, we only restrict creating some important Kubernetes resources like Pod
s or DaemonSet
s etc.. for the Application
responsible for deploying tenant Application
s.
We would like to seek better ways of enabling multi-tenancy with Argo CD
and also contribute it if possible.
Summary
In this article, we introduced our Argo CD based GitOps system in neco
.
We covered the top 5 best practices for GitOps, along with some valuable lessons we found along the way. For example, using applications to manage the deployment lifecycle of other applications.
Thank you for reading, we hope you've been able to learn something from our experience and avoid some of the problems we faced.
If you'd like to find out more, please come and check out our neco-apps repository on GitHub!