The No-Nonsense Guide to ArgoCD
ArgoCD isn't perfect. Here's a no-nonsense guide to its common problems, from sync timeouts to Helm hell, and how to actually fix them.
ArgoCD is the undisputed king of GitOps on Kubernetes. Everyone uses it. But the official documentation only tells you how to drive it on a freshly paved road. Out in the real world, where the pavement ends, you'll find potholes, weird engine noises, and a whole lot of community forum posts from people stuck in the mud.
This isn't a theoretical guide. This is a field guide based on what people are actually complaining about on Reddit, GitHub issues, and Stack Overflow. It's about the problems that don't make it into the marketing material.
The Sync & Performance Nightmare
The most common sign something’s wrong in ArgoCD land is the dreaded sync timeout or an application controller pegging the CPU. I've seen teams immediately throw more memory and CPU at the problem. It rarely works. That's a rookie move.
The issue is almost never the raw resources. It’s usually a symptom of a deeper problem:
- Inefficient Manifest Generation: Your
argocd-repo-server
is choking because your Helm chart is a monster or your Kustomize setup is too complex. - API Server Throttling: The
argocd-application-controller
is hammering the Kubernetes API server too hard, causing it to throttle requests. - Bad Caching: You're not using caching effectively, forcing ArgoCD to re-calculate everything on every reconciliation loop.
Before you scale up, you need to tune the controller. These are the knobs you should be looking at first.
Parameter | Component | Default | Why You Should Care |
---|---|---|---|
|
| 20 | Controls how many apps can be reconciled at once. Too low, and things get slow. |
|
| 10 | Controls how many sync operations can run at once. Increase if syncs are queueing up. |
|
| 50 | Rate limit for talking to the K8s API. If you see throttling, bump this, but watch your API server. |
|
| 180s | How often Argo checks Git. If you have thousands of apps, you don't need it checking every 3 minutes. |
|
| 1 | Concurrent manifest generations. If your repo server is OOM'ing, this is a likely culprit. |
Start here. Tweak these values, watch your metrics, and only then consider giving it more raw power.
Helm Hooks Are a Trap
Here's what trips up more teams than anything: ArgoCD doesn't run helm install
or helm upgrade
. It runs helm template
.
And that changes everything.
It means ArgoCD has no concept of an "install" vs. an "upgrade." It's just a "sync." The nasty side effect is that your Helm pre-install
and pre-upgrade
hooks both run. Every. Single. Time. If your hooks aren't idempotent—meaning they can run over and over without causing problems—you're in for a world of pain.
The fix is to design your hooks to be harmless on repeated runs. For a one-off job, that means telling ArgoCD to clean up the hook resource after it succeeds.
# In your hook's manifest (e.g., a Job)
apiVersion: batch/v1
kind: Job
metadata:
name: my-presync-db-migration
annotations:
# This is the ArgoCD hook annotation
argocd.argoproj.io/hook: PreSync
# This tells ArgoCD to delete the Job object once the hook succeeds
argocd.argoproj.io/hook-delete-policy: HookSucceeded
spec:
template:
spec:
containers:
- name: db-migrator
image: my-company/db-migrator:1.2.0
# ... rest of your job spec
restartPolicy: Never
backoffLimit: 1
Also, forget about using Helm's lookup
function. Since helm template
runs without cluster access, lookup
won't work. You'll have to refactor your charts to pass that data in via values.
The App-of-Apps Spaghetti
The "App-of-Apps" pattern is the standard way to manage complex environments. You have a root app that deploys... other apps. It's a great idea until you have dependencies. What if your monitoring app needs the CRDs from your Prometheus operator app to be deployed first?
By default, ArgoCD syncs them all at once. Chaos ensues.
The solution is Sync Waves. It’s a simple annotation that lets you add an order to the chaos. Resources in lower-numbered waves are synced and must become healthy before ArgoCD moves on to the next wave.
# In your Prometheus Operator Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: prometheus-operator
annotations:
# Wave 0: Deploy the operator CRDs first
argocd.argoproj.io/sync-wave: "0"
# ...
---
# In your Monitoring Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kube-prometheus-stack
annotations:
# Wave 1: Deploy after the operator is ready
argocd.argoproj.io/sync-wave: "1"
# ...
It's simple, but essential for making the App-of-Apps pattern usable.
The Terraform-to-ArgoCD Chasm
Here's where GitOps gets really awkward. You provision your infrastructure—say, an AWS RDS database—with Terraform. Terraform outputs the database endpoint, username, and a secret ARN. Now, how do you get that information into the Kubernetes application that ArgoCD is deploying?
This is a classic handoff problem, and the community has come up with some clever, if clunky, workarounds:
- Parameter Store: Your CI pipeline runs Terraform, which writes the outputs to AWS Parameter Store. Then, your ArgoCD application uses the External Secrets Operator to read from Parameter Store and create a Kubernetes secret.
- Commit-Back-to-Git: Terraform uses a GitHub provider to commit a
values.yaml
file containing the outputs directly back into your GitOps repo.
Both of these work. But they feel like duct tape. You're creating this awkward seam in your process, either by relying on an external store as a middleman or by having your infrastructure tool pollute your application configuration history.
This is a place where a more integrated platform approach makes a lot more sense. Tools like Scalr, for instance, don't see infrastructure provisioning and application deployment as two separate worlds that need to be bridged. They manage the entire workflow. When a Terraform module creates an RDS instance, its outputs become first-class citizens, programmatically available to the next stage in the pipeline that deploys the application via ArgoCD. There's no clumsy handoff because it's all part of a single, cohesive environment definition. It solves the problem at an architectural level instead of patching over it with clever scripts.
Secrets: Just Stop Using the Vault Plugin
For years, people used the ArgoCD Vault Plugin (AVP) to inject secrets during manifest generation. If you're still doing this, stop. The ArgoCD maintainers themselves now officially recommend against it.
Why? It's a security anti-pattern. Using AVP means your argocd-repo-server
needs a credential to your Vault instance. This widens your attack surface. Worse, the rendered manifests, now with plaintext secrets, get stored in ArgoCD's Redis cache.
The modern, secure way is to use an operator-based pattern.
- The Operator: You install something like the External Secrets Operator (ESO) in your cluster.
- The Process: You commit an
ExternalSecret
manifest to Git. This manifest tells ESO where to find the secret in Vault (or AWS/GCP/Azure secret managers). ESO then fetches the secret and creates a native KubernetesSecret
object inside the cluster. - ArgoCD's Role: Your application, managed by ArgoCD, simply mounts the native Kubernetes
Secret
like it always would.
In this model, ArgoCD is completely ignorant of Vault. It doesn't need credentials. It doesn't handle plaintext secrets. It just manages the ExternalSecret
custom resource, and the specialized operator handles the sensitive work. It's a much cleaner separation of concerns.
Final Thoughts
ArgoCD is a fantastic tool, but it's not magic. Mastering it means learning its sharp edges and understanding the "why" behind the community's hard-won best practices. It's about knowing which knobs to turn for performance, how to design idempotent hooks, and when to use architectural patterns like operators to solve problems cleanly. Don't just follow the happy path in the docs; learn from the struggles in the field.
Key Sources Used:
- Argo CD Official Documentation (argo-cd.readthedocs.io)
- r/ArgoCD & r/kubernetes on Reddit for community-reported issues
- Akuity Blog: How to Integrate Terraform with Argo CD for GitOps Workflows
- Akuity Blog: The 3 Most Common Argo CD Architectures Explained
- Argo CD GitHub Discussions and Issues