
Durrell GemuhI recently went through the full process of setting up GKE Gateway API on a production-grade cluster...
I recently went through the full process of setting up GKE Gateway API on a production-grade cluster — Fleet registration, controller enablement, Helm deployment, and a handful of painful debugging sessions. This is the cleaned-up runbook so you don't have to learn these lessons the hard way.
| Field | Value |
|---|---|
| Project | my-gcp-project |
| Cluster | my-gke-cluster |
| Region | us-east1 |
| Namespace | my-namespace |
Traffic flow:
Client → Cloud Load Balancer → GKE Gateway (L7) → HTTPRoute → Kubernetes Service → Pod
gcloud config set project my-gcp-project
gcloud container clusters get-credentials my-gke-cluster \
--region us-east1
gcloud services enable container.googleapis.com
gcloud services enable gkehub.googleapis.com
gcloud services enable serviceusage.googleapis.com
gcloud services enable multiclusteringress.googleapis.com
gcloud services enable multiclusterservicediscovery.googleapis.com
Grant your service account the roles needed for Fleet and Ingress management:
SA=serviceAccount:my-service-account@my-gcp-project.iam.gserviceaccount.com
PROJECT=my-gcp-project
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/container.admin"
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/gkehub.admin"
gcloud projects add-iam-policy-binding $PROJECT \
--member="$SA" --role="roles/serviceusage.serviceUsageAdmin"
The Workload Identity pool must match your service project, not the host project.
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--workload-pool=my-service-project.svc.id.goog
gcloud container fleet memberships register my-gke-cluster \
--gke-cluster=us-east1/my-gke-cluster \
--enable-workload-identity
# Verify
gcloud container fleet memberships list
gcloud container fleet ingress enable
# Verify — expected: state: ACTIVE, membershipStates: OK
gcloud container fleet ingress describe
This is a separate step from Fleet Ingress. Without it,
kubectl get gatewayclass
returns nothing and the controller will never attach.
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--enable-gateway-api
The GKE-managed Gateway controller requires a release channel to attach to the cluster:
gcloud container clusters update my-gke-cluster \
--region us-east1 \
--release-channel regular
kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml
# Verify
kubectl get crds | grep gateway
After the controller attaches, these should appear:
kubectl get gatewayclass
# NAME CONTROLLER
# gke-l7-rilb networking.gke.io/gateway
# gke-l7-global-external-managed networking.gke.io/gateway
# gke-l7-gxlb networking.gke.io/gateway
# gke-l7-regional-external-managed networking.gke.io/gateway
helm upgrade --install my-gateway-chart ./gateway \
-n my-namespace --create-namespace
kubectl get gateway -n my-namespace
kubectl get httproute -n my-namespace
kubectl get svc -n my-namespace
kubectl describe gateway -n my-namespace
This is where things got interesting.
Symptom: kubectl get gatewayclass returns no resources.
Root cause: Fleet Ingress and Gateway API at cluster level are two separate
enablement steps. You need both.
Fix: Run --enable-gateway-api on the cluster (step 2 above).
Symptom: ArgoCD shows Gateway as Progressing indefinitely.
Root cause: No release channel was set. The managed controller won't attach
without one.
Fix: Set the release channel, then delete the Gateway resource and let ArgoCD
re-sync:
kubectl delete gateway my-https-gateway -n my-namespace
# ArgoCD will recreate it against the now-attached controller
Error:
CertificateMap "my-cert-map-cert-map" must not be configured in a region other than global
Two things went wrong here simultaneously.
Problem A — Helm double-suffix bug
The Helm template was concatenating a suffix onto a value that already had it:
# values.yaml
certMap: my-cert-map
# template (broken)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}-cert-map
# renders as: my-cert-map-cert-map ❌
# template (correct)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}
# renders as: my-cert-map ✔
Problem B — Wrong GatewayClass
gke-l7-rilb is regional internal. CertificateMaps are global-only — they are
incompatible:
# WRONG
gatewayClassName: gke-l7-rilb # regional internal, no certMap
# CORRECT
gatewayClassName: gke-l7-global-external-managed # global, certMap supported
🚨 Rule to remember: CertificateMaps are GLOBAL ONLY. Never pair them
with a regional GatewayClass.
Error:
BackendNotFound: services my-namespace/<name> not found
Root cause: The backendRefs in the HTTPRoute used Helm value names, not the
actual Kubernetes Service names.
# WRONG — these were Helm value names, not Service names
backendRefs:
- name: frontend-service # ❌
- name: search-service # ❌
# CORRECT — exact Kubernetes Service names
backendRefs:
- name: frontend # ✔
- name: search # ✔
- name: dashboard # ✔
- name: analytics-web # ✔
kubectl get gateway -n my-namespace
# Look at the ADDRESS column
curl -v http://<GATEWAY-IP>
# 404 fault filter abort
This is expected. The Gateway matched the request but no HTTPRoute matched
the Host header. It's not an error — it confirms the Gateway is working.
curl -H "Host: myapp.dev.example.io" http://<GATEWAY-IP>
# Should return your app's HTML
Create an A record:
myapp.dev.example.io → <GATEWAY-IP>
curl -vk https://myapp.dev.example.io
kubectl port-forward svc/analytics-web 8080:3000 -n my-namespace
curl http://localhost:8080
Orphaned HTTPRoutes from previous deploys will cause ArgoCD to show Degraded
even when the app is healthy. Clean them up:
kubectl delete httproute old-frontend-route -n my-namespace
kubectl delete httproute old-search-route -n my-namespace
Or enable pruning in your ArgoCD Application:
syncPolicy:
syncOptions:
- PruneLast=true
# Gateway status
kubectl describe gateway -n my-namespace
kubectl get gateway -n my-namespace -w
# HTTPRoutes
kubectl describe httproute -n my-namespace
# Controller (no pods = managed mode, that's normal)
kubectl get pods -A | grep -i gateway
kubectl get gatewayclass
kubectl get crd | grep gateway
# Cluster config
gcloud container clusters describe my-gke-cluster --region us-east1
# Events
kubectl get events -n my-namespace --sort-by=.lastTimestamp
| Component | Status |
|---|---|
| GatewayClass | ✅ Accepted |
| Gateway | ✅ Programmed |
| HTTPRoute | ✅ Healthy |
| Services | ✅ Resolved |
| Load Balancer | ✅ Active |
| Pods | ✅ Running |
| ArgoCD | ✅ Synced |
--enable-gateway-api separately on the cluster.regular or stable.gke-l7-rilb or any regional GatewayClass.helm template and inspect the rendered output before applying.backendRefs need exact Service names. Your Helm value names and your Kubernetes Service names are not the same thing.Wildcard hostname — Reduces per-service HTTPRoute config significantly:
spec:
hostnames:
- "*.dev.example.io"
CI validation — Add a pre-deploy check that verifies every backendRef name exists as a live Service in the target namespace. This eliminates the BackendNotFound class of errors before they hit the cluster.
Helm schema guards — Use values.schema.json to validate that your certMap value doesn't already end with the suffix your template appends. Catches double-suffix bugs at helm lint time.
Unified naming — Make your Helm release name, Kubernetes Service name, and HTTPRoute backendRef all derive from the same value. One source of truth, zero drift.
Hope this saves someone a few hours. The GKE docs cover each of these pieces individually but the interactions between them — especially Fleet Ingress vs cluster-level Gateway API enablement, and the CertificateMap GatewayClass constraint — aren't obvious until you hit them.