Lesson 25 of 28
Module 7 · Concepts — RBAC, cloud IAM, and Workload Identity Federation
Every cluster you've touched in this course has been talking to an identity — you just haven't had to name it. Your laptop's kubectl authenticates as the kind cluster's admin context. The GitHub Actions workflow runs as a short-lived GITHUB_TOKEN. Prometheus scrapes via a ClusterRole-bound ServiceAccount the Helm chart silently created. This module pulls those identities into the foreground, because on real clusters the single largest source of avoidable incidents is an overprivileged identity doing more than it should.
The primitives
Kubernetes RBAC is four nouns:
- ServiceAccount — the identity a pod runs as inside the cluster. Lives in a namespace. Every namespace has a
defaultSA; every pod gets it unless you specify otherwise. - Role / ClusterRole — a list of verbs (
get,list,watch,create,update,patch,delete,deletecollection) allowed on resources (pods,deployments,secrets, etc.). Role is namespace-scoped; ClusterRole is cluster-scoped or reusable across namespaces via RoleBinding. - RoleBinding / ClusterRoleBinding — binds a Role to a subject (a ServiceAccount, a User, or a Group). Without the binding, the Role does nothing. RoleBinding is namespace-scoped; ClusterRoleBinding is cluster-scoped.
- Subject — the thing being granted access. In production most subjects are ServiceAccounts; users are for humans with
kubectl.
The mental model: subject + verb + resource = allowed? When kubectl auth can-i list pods --as=system:serviceaccount:demo:my-app returns yes, that's the answer of that equation for that specific triple.
Least privilege is load-bearing
The single rule that separates platform teams that sleep well from platform teams that don't: every workload's ServiceAccount should have the minimum verbs on the minimum resources required for the workload's job — and nothing else.
The anti-patterns you'll encounter, in order of commonness:
cluster-adminon an app's ServiceAccount "because it was faster to debug." A compromised pod now owns the cluster. Non-recoverable without rebuilding.- The
defaultServiceAccount bound to a wide Role "because the helm chart didn't explicitly setserviceAccountName." Every new workload in that namespace inherits the permission. Silent privilege escalation by accident. - A ClusterRole granted at cluster scope when a namespace-scoped Role would work. A pod in a low-trust namespace can now read secrets from every namespace in the cluster.
get secretson a namespace-wide Role when only one specific secret is needed. UseresourceNames: [specific-secret-name]in the Role to narrow further.
The cheap habit that avoids most of these: for every new workload, create its ServiceAccount by hand, bind it to a Role you wrote (not an off-the-shelf one), and test with kubectl auth can-i before deploying.
Cloud IAM — the cross-boundary problem
RBAC governs access to the Kubernetes API. The moment your workload needs to read a GCS bucket, publish to a Pub/Sub topic, call an AWS Lambda, or write to DynamoDB, RBAC is silent and cloud IAM takes over. These are two different identity systems, and you need to bridge them.
The anti-pattern most teams start with: a long-lived service account key stored as a Kubernetes Secret, mounted into the pod, read by the application. Problems:
- The key never rotates (or rotates once a year by hand and everyone dreads it).
- Anyone who can read the Secret has cloud credentials outside the cluster.
- The key is effectively a cluster-egress credential — it works from the attacker's laptop, not just the pod.
- Compromise → rebuild the project, not the cluster.
Workload Identity Federation — the modern pattern
The goal: make a specific Kubernetes ServiceAccount assumable as a specific cloud IAM role, without any long-lived secret anywhere.
The mechanism (same on all three clouds, different product names):
- The cluster's API server is an OIDC issuer. Each ServiceAccount token it mints includes claims: issuer, subject (
system:serviceaccount:<ns>:<name>), expiry. - In the cloud's IAM, you create a Workload Identity Pool (or equivalent) that trusts the cluster's OIDC issuer as an identity source.
- You grant the cloud IAM role on a principal expressed as
<pool>/subject/system:serviceaccount:<ns>:<name>. That's the map: "this specific SA in this specific cluster can assume this specific cloud role." - The pod's SDK (Google Cloud SDK, AWS SDK, Azure SDK) detects the ambient OIDC token, exchanges it at the cloud's STS for a short-lived cloud credential, and uses that. No static key is ever provisioned or stored.
Product names:
- GKE: Workload Identity. You annotate the KSA with
iam.gke.io/gcp-service-account=<GSA>@<project>.iam.gserviceaccount.comand grant the KSAroles/iam.workloadIdentityUseron the GSA. The GSA then holds the actual cloud permissions. - EKS: IAM Roles for Service Accounts (IRSA). You annotate the KSA with
eks.amazonaws.com/role-arn=arn:aws:iam::...:role/...and the AWS SDK picks it up automatically. - AKS: Azure Workload Identity. Same pattern — KSA annotation + federated credential on the managed identity.
Same pattern for CI identity
Module 3 used secrets.GITHUB_TOKEN — a short-lived token scoped to a single workflow run. To deploy to a cloud cluster, use OIDC federation from GitHub to the cloud:
- The cloud IAM (workload identity pool) trusts GitHub's OIDC issuer
https://token.actions.githubusercontent.com. - You map
repo:<your-org>/<your-repo>:ref:refs/heads/mainto a specific IAM role. - The workflow adds
permissions: id-token: write.google-github-actions/auth(or the AWS/Azure equivalent) requests the OIDC token, exchanges it for cloud credentials, and exposes them to later steps.
The combination — Workload Identity for in-cluster workloads, OIDC federation for CI — eliminates every long-lived cloud credential from your org. When it works, it's invisible. When it doesn't, the error messages are notoriously cryptic, which is why you're going to practice it.
What this module does not cover
PodSecurityStandards (what privileges a pod can grant itself beyond the SA's RBAC), admission webhooks (policies that reject pods before RBAC is even evaluated), and secrets encryption at rest are out of scope. This module is the narrow, load-bearing slice: ServiceAccounts, Roles, bindings, and the federation bridge to cloud IAM. Get those right first; the rest builds on them.
In the task lesson you'll create a dedicated ServiceAccount + narrow Role for a pod, prove with kubectl auth can-i that it has the permissions you want and only those, and sketch the Workload Identity Federation binding for a GKE deployment (without actually standing up GKE — the pattern is what matters). Then the skill lesson codifies it as rbac-iam-scaffold, with a boundary statement that explicitly names what cloud-specific bits it won't automate.