GKE Gateway API: Full Setup & Troubleshooting Runbook

GKE Gateway API: Full Setup & Troubleshooting Runbook

# kubernetes# gke# devops# cloud
GKE Gateway API: Full Setup & Troubleshooting RunbookDurrell Gemuh

I recently went through the full process of setting up GKE Gateway API on a production-grade cluster...

I recently went through the full process of setting up GKE Gateway API on a production-grade cluster — Fleet registration, controller enablement, Helm deployment, and a handful of painful debugging sessions. This is the cleaned-up runbook so you don't have to learn these lessons the hard way.


Cluster Context

Field Value
Project my-gcp-project
Cluster my-gke-cluster
Region us-east1
Namespace my-namespace

Traffic flow:
Client → Cloud Load Balancer → GKE Gateway (L7) → HTTPRoute → Kubernetes Service → Pod


1. Prerequisites & Initial Setup

Set project & cluster context

gcloud config set project my-gcp-project

gcloud container clusters get-credentials my-gke-cluster \
  --region us-east1
Enter fullscreen mode Exit fullscreen mode

Enable required GCP APIs

gcloud services enable container.googleapis.com
gcloud services enable gkehub.googleapis.com
gcloud services enable serviceusage.googleapis.com
gcloud services enable multiclusteringress.googleapis.com
gcloud services enable multiclusterservicediscovery.googleapis.com
Enter fullscreen mode Exit fullscreen mode

IAM permissions

Grant your service account the roles needed for Fleet and Ingress management:

SA=serviceAccount:my-service-account@my-gcp-project.iam.gserviceaccount.com
PROJECT=my-gcp-project

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/container.admin"

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/gkehub.admin"

gcloud projects add-iam-policy-binding $PROJECT \
  --member="$SA" --role="roles/serviceusage.serviceUsageAdmin"
Enter fullscreen mode Exit fullscreen mode

2. Fleet Registration & Gateway Enablement

Enable Workload Identity

The Workload Identity pool must match your service project, not the host project.

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --workload-pool=my-service-project.svc.id.goog
Enter fullscreen mode Exit fullscreen mode

Register the cluster to Fleet

gcloud container fleet memberships register my-gke-cluster \
  --gke-cluster=us-east1/my-gke-cluster \
  --enable-workload-identity

# Verify
gcloud container fleet memberships list
Enter fullscreen mode Exit fullscreen mode

Enable Fleet Ingress (Gateway controller)

gcloud container fleet ingress enable

# Verify — expected: state: ACTIVE, membershipStates: OK
gcloud container fleet ingress describe
Enter fullscreen mode Exit fullscreen mode

Enable Gateway API at cluster level

This is a separate step from Fleet Ingress. Without it, kubectl get gatewayclass
returns nothing and the controller will never attach.

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --enable-gateway-api
Enter fullscreen mode Exit fullscreen mode

Enable a release channel

The GKE-managed Gateway controller requires a release channel to attach to the cluster:

gcloud container clusters update my-gke-cluster \
  --region us-east1 \
  --release-channel regular
Enter fullscreen mode Exit fullscreen mode

Install Gateway API CRDs

kubectl apply -f https://github.com/kubernetes-sigs/gateway-api/releases/latest/download/standard-install.yaml

# Verify
kubectl get crds | grep gateway
Enter fullscreen mode Exit fullscreen mode

Verify GatewayClasses

After the controller attaches, these should appear:

kubectl get gatewayclass

# NAME                                  CONTROLLER
# gke-l7-rilb                           networking.gke.io/gateway
# gke-l7-global-external-managed        networking.gke.io/gateway
# gke-l7-gxlb                           networking.gke.io/gateway
# gke-l7-regional-external-managed      networking.gke.io/gateway
Enter fullscreen mode Exit fullscreen mode

3. Helm Chart Deployment

helm upgrade --install my-gateway-chart ./gateway \
  -n my-namespace --create-namespace
Enter fullscreen mode Exit fullscreen mode

Validate

kubectl get gateway    -n my-namespace
kubectl get httproute  -n my-namespace
kubectl get svc        -n my-namespace
kubectl describe gateway -n my-namespace
Enter fullscreen mode Exit fullscreen mode

4. Issues & Fixes

This is where things got interesting.

Issue 1 — GatewayClass missing

Symptom: kubectl get gatewayclass returns no resources.

Root cause: Fleet Ingress and Gateway API at cluster level are two separate
enablement steps. You need both.

Fix: Run --enable-gateway-api on the cluster (step 2 above).


Issue 2 — Gateway stuck "Waiting for controller"

Symptom: ArgoCD shows Gateway as Progressing indefinitely.

Root cause: No release channel was set. The managed controller won't attach
without one.

Fix: Set the release channel, then delete the Gateway resource and let ArgoCD
re-sync:

kubectl delete gateway my-https-gateway -n my-namespace
# ArgoCD will recreate it against the now-attached controller
Enter fullscreen mode Exit fullscreen mode

Issue 3 — CertificateMap region mismatch

Error:

CertificateMap "my-cert-map-cert-map" must not be configured in a region other than global
Enter fullscreen mode Exit fullscreen mode

Two things went wrong here simultaneously.

Problem A — Helm double-suffix bug

The Helm template was concatenating a suffix onto a value that already had it:

# values.yaml
certMap: my-cert-map

# template (broken)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}-cert-map
# renders as: my-cert-map-cert-map ❌

# template (correct)
networking.gke.io/certmap: {{ .Values.gateway.certMap }}
# renders as: my-cert-map ✔
Enter fullscreen mode Exit fullscreen mode

Problem B — Wrong GatewayClass

gke-l7-rilb is regional internal. CertificateMaps are global-only — they are
incompatible:

# WRONG
gatewayClassName: gke-l7-rilb            # regional internal, no certMap

# CORRECT
gatewayClassName: gke-l7-global-external-managed  # global, certMap supported
Enter fullscreen mode Exit fullscreen mode

🚨 Rule to remember: CertificateMaps are GLOBAL ONLY. Never pair them
with a regional GatewayClass.


Issue 4 — HTTPRoute BackendNotFound

Error:

BackendNotFound: services my-namespace/<name> not found
Enter fullscreen mode Exit fullscreen mode

Root cause: The backendRefs in the HTTPRoute used Helm value names, not the
actual Kubernetes Service names.

# WRONG — these were Helm value names, not Service names
backendRefs:
  - name: frontend-service   # ❌
  - name: search-service     # ❌

# CORRECT — exact Kubernetes Service names
backendRefs:
  - name: frontend           # ✔
  - name: search             # ✔
  - name: dashboard          # ✔
  - name: analytics-web      # ✔
Enter fullscreen mode Exit fullscreen mode

5. Validation & Testing

Get the Gateway IP

kubectl get gateway -n my-namespace
# Look at the ADDRESS column
Enter fullscreen mode Exit fullscreen mode

Why a bare curl returns 404

curl -v http://<GATEWAY-IP>
# 404 fault filter abort
Enter fullscreen mode Exit fullscreen mode

This is expected. The Gateway matched the request but no HTTPRoute matched
the Host header. It's not an error — it confirms the Gateway is working.

Test with the correct Host header

curl -H "Host: myapp.dev.example.io" http://<GATEWAY-IP>
# Should return your app's HTML
Enter fullscreen mode Exit fullscreen mode

DNS

Create an A record:

myapp.dev.example.io  →  <GATEWAY-IP>
Enter fullscreen mode Exit fullscreen mode

HTTPS

curl -vk https://myapp.dev.example.io
Enter fullscreen mode Exit fullscreen mode

Local pod test

kubectl port-forward svc/analytics-web 8080:3000 -n my-namespace
curl http://localhost:8080
Enter fullscreen mode Exit fullscreen mode

6. ArgoCD Cleanup

Orphaned HTTPRoutes from previous deploys will cause ArgoCD to show Degraded
even when the app is healthy. Clean them up:

kubectl delete httproute old-frontend-route  -n my-namespace
kubectl delete httproute old-search-route    -n my-namespace
Enter fullscreen mode Exit fullscreen mode

Or enable pruning in your ArgoCD Application:

syncPolicy:
  syncOptions:
    - PruneLast=true
Enter fullscreen mode Exit fullscreen mode

7. Quick Debug Reference

# Gateway status
kubectl describe gateway   -n my-namespace
kubectl get gateway -n my-namespace -w

# HTTPRoutes
kubectl describe httproute -n my-namespace

# Controller (no pods = managed mode, that's normal)
kubectl get pods -A | grep -i gateway
kubectl get gatewayclass
kubectl get crd | grep gateway

# Cluster config
gcloud container clusters describe my-gke-cluster --region us-east1

# Events
kubectl get events -n my-namespace --sort-by=.lastTimestamp
Enter fullscreen mode Exit fullscreen mode

8. Final State

Component Status
GatewayClass ✅ Accepted
Gateway ✅ Programmed
HTTPRoute ✅ Healthy
Services ✅ Resolved
Load Balancer ✅ Active
Pods ✅ Running
ArgoCD ✅ Synced

9. Key Lessons

  1. Fleet Ingress ≠ Gateway controller ready. You need --enable-gateway-api separately on the cluster.
  2. No release channel = no controller attachment. Add the cluster to regular or stable.
  3. CertificateMaps are global only. Never use them with gke-l7-rilb or any regional GatewayClass.
  4. Helm string concatenation silently breaks GCP resource names. Always helm template and inspect the rendered output before applying.
  5. HTTPRoute backendRefs need exact Service names. Your Helm value names and your Kubernetes Service names are not the same thing.
  6. 404 fault filter abort = hostname mismatch, not a broken Gateway. Check your Host header.
  7. No gateway pods in GKE managed mode is normal. The controller is cloud-managed.
  8. Dataplane V2 can't be enabled on an existing cluster. You'd need to recreate it — and it's not required for Gateway API anyway.

10. Improvements Worth Making

Wildcard hostname — Reduces per-service HTTPRoute config significantly:

spec:
  hostnames:
    - "*.dev.example.io"
Enter fullscreen mode Exit fullscreen mode

CI validation — Add a pre-deploy check that verifies every backendRef name exists as a live Service in the target namespace. This eliminates the BackendNotFound class of errors before they hit the cluster.

Helm schema guards — Use values.schema.json to validate that your certMap value doesn't already end with the suffix your template appends. Catches double-suffix bugs at helm lint time.

Unified naming — Make your Helm release name, Kubernetes Service name, and HTTPRoute backendRef all derive from the same value. One source of truth, zero drift.


Hope this saves someone a few hours. The GKE docs cover each of these pieces individually but the interactions between them — especially Fleet Ingress vs cluster-level Gateway API enablement, and the CertificateMap GatewayClass constraint — aren't obvious until you hit them.