Lesson 14 of 28
Module 4 · Task — Provision a cluster with Terraform (via Claude)
The task
Drive Claude to write Terraform config that provisions a kind cluster, outputs a working kubeconfig, and lets you run kubectl get nodes against it using only files Terraform produced. Destroy it cleanly. Read every .tf file before you terraform apply.
Acceptance test:
terraform apply -auto-approve succeeds, then kubectl --kubeconfig ./kubeconfig get nodes lists at least one node with status Ready, then terraform destroy -auto-approve removes the cluster (kind get clusters no longer lists it).
Setup
- Module 1 completed —
docker,kubectl,kindon your PATH. - Terraform (or OpenTofu) installed —
terraform version(ortofu version) prints a version. - No existing
kindcluster namedtf-demo.
Drive it through Claude
New directory + provider declaration. Send Claude:
"Create a new directory
tf-kindand in it a fileversions.tfdeclaring Terraform ≥ 1.5, thetehcyx/kindprovider (version~> 0.4), and thehashicorp/localprovider (version~> 2.5). Add an emptyprovider \"kind\" {}block."Read
versions.tf. Ask Claude: what does the~>version operator mean, and why would we use it instead of pinning to an exact version? You want to understand this before your state file survives a year and someone wonders why theirplansuddenly drifts.Variables. Send:
"Create
variables.tfwith two variables:cluster_name(string, defaulttf-demo) andnode_image(string, defaultkindest/node:v1.30.0). Both withdescriptionfields."The cluster resource. Send:
"Create
main.tfwith aresource \"kind_cluster\" \"main\"named fromvar.cluster_name, node image fromvar.node_image,wait_for_ready = true, and akind_configblock with one control-plane node that forwards host port 9898 to container port 30080."Read
main.tf. Ask Claude: what doeswait_for_readyactually do, and what would happen if I set it tofalse? Make sure you understand the blocking behavior before applying.Outputs + kubeconfig file. Send:
"Create
outputs.tfwith two outputs (cluster_name,endpoint) and aresource \"local_file\" \"kubeconfig\"that writeskind_cluster.main.kubeconfigto./kubeconfigwith file permissions0600."Init, plan, apply. Send:
"Run
terraform init, thenterraform plan, then show me the plan output. Don't apply yet."Read the plan. You should see
Plan: 2 to add, 0 to change, 0 to destroy.— onekind_cluster, onelocal_file. Reading the plan before apply is a habit; build it now. Then tell Claude toterraform apply -auto-approve.
A note on identity — what Terraform-for-real needs
kind runs locally using your Docker daemon, so the only credential involved is "can this process talk to /var/run/docker.sock". The moment you swap kind for GKE, EKS, or AKS, identity becomes first-class:
- GKE: you need a GCP service account with roles
container.adminandcompute.networkAdmin(among others), and Terraform uses Application Default Credentials (gcloud auth application-default login) or a key file. - EKS: an IAM user or assumed role with a broad set of
eks:*,ec2:*,iam:*permissions. Theiampermissions are the scariest — a misconfigured Terraform module can grant itself more privilege than intended. - AKS: an Azure AD service principal with Contributor on the target resource group.
The right pattern for any of these is OIDC federation from CI (same as module 3) plus a minimum-scoped role for the specific Terraform run. Long-lived keys in terraform.tfvars are the anti-pattern you'll encounter and need to push back against in real work. Module 7 covers this end-to-end.
Break it on purpose
State and reality can drift. See it happen.
- With the cluster up, delete it out-of-band:
kind delete cluster --name tf-demo - Predict: what will
terraform plansay next? Will it (a) notice the cluster is gone and plan to recreate, (b) error out, or (c) silently continue believing the cluster exists? Write down your guess. - Run
terraform plan. Read the output. - Run
terraform apply -auto-approve. Observe. - Now try the reverse: with the cluster up, edit
variables.tfto changecluster_nametotf-demo-v2.terraform plan. What does Terraform want to do now, and what does that teach you about resource identity vs configuration?
The class of failure: state drift. State is Terraform's belief about the world. The world can change without telling it. Anyone who has run terraform apply in anger on a real team has hit this, usually at 11pm during an incident. The skill in the next lesson will need to know how to recover.
Acceptance test
kubectl --kubeconfig ./kubeconfig get nodes
Expected:
NAME STATUS ROLES AGE VERSION
tf-demo-control-plane Ready control-plane 42s v1.30.0
Then tear down:
terraform destroy -auto-approve
kind get clusters
tf-demo should not be in the list.
What to keep for the next lesson
Keep the tf-kind/ directory, the Claude transcript, and your Break it on purpose notes on state drift. In the next lesson you'll codify .claude/skills/terraform-kind-provision/ and explicitly teach it what to do when state believes a cluster exists but reality disagrees — otherwise a fresh Claude session will dutifully re-apply against stale assumptions.