Lesson 14 of 28

Module 4 · Task — Provision a cluster with Terraform (via Claude)

doc

Checking sign-in…

The task

Drive Claude to write Terraform config that provisions a kind cluster, outputs a working kubeconfig, and lets you run kubectl get nodes against it using only files Terraform produced. Destroy it cleanly. Read every .tf file before you terraform apply.

Acceptance test: terraform apply -auto-approve succeeds, then kubectl --kubeconfig ./kubeconfig get nodes lists at least one node with status Ready, then terraform destroy -auto-approve removes the cluster (kind get clusters no longer lists it).

Setup

Module 1 completed — docker, kubectl, kind on your PATH.
Terraform (or OpenTofu) installed — terraform version (or tofu version) prints a version.
No existing kind cluster named tf-demo.

Drive it through Claude

New directory + provider declaration. Send Claude:

"Create a new directory tf-kind and in it a file versions.tf declaring Terraform ≥ 1.5, the tehcyx/kind provider (version ~> 0.4), and the hashicorp/local provider (version ~> 2.5). Add an empty provider \"kind\" {} block."

Read versions.tf. Ask Claude: what does the ~> version operator mean, and why would we use it instead of pinning to an exact version? You want to understand this before your state file survives a year and someone wonders why their plan suddenly drifts.
Variables. Send:

"Create variables.tf with two variables: cluster_name (string, default tf-demo) and node_image (string, default kindest/node:v1.30.0). Both with description fields."
The cluster resource. Send:

"Create main.tf with a resource \"kind_cluster\" \"main\" named from var.cluster_name, node image from var.node_image, wait_for_ready = true, and a kind_config block with one control-plane node that forwards host port 9898 to container port 30080."

Read main.tf. Ask Claude: what does wait_for_ready actually do, and what would happen if I set it to false? Make sure you understand the blocking behavior before applying.
Outputs + kubeconfig file. Send:

"Create outputs.tf with two outputs (cluster_name, endpoint) and a resource \"local_file\" \"kubeconfig\" that writes kind_cluster.main.kubeconfig to ./kubeconfig with file permissions 0600."
Init, plan, apply. Send:

"Run terraform init, then terraform plan, then show me the plan output. Don't apply yet."

Read the plan. You should see Plan: 2 to add, 0 to change, 0 to destroy. — one kind_cluster, one local_file. Reading the plan before apply is a habit; build it now. Then tell Claude to terraform apply -auto-approve.

A note on identity — what Terraform-for-real needs

kind runs locally using your Docker daemon, so the only credential involved is "can this process talk to /var/run/docker.sock". The moment you swap kind for GKE, EKS, or AKS, identity becomes first-class:

GKE: you need a GCP service account with roles container.admin and compute.networkAdmin (among others), and Terraform uses Application Default Credentials (gcloud auth application-default login) or a key file.
EKS: an IAM user or assumed role with a broad set of eks:*, ec2:*, iam:* permissions. The iam permissions are the scariest — a misconfigured Terraform module can grant itself more privilege than intended.
AKS: an Azure AD service principal with Contributor on the target resource group.

The right pattern for any of these is OIDC federation from CI (same as module 3) plus a minimum-scoped role for the specific Terraform run. Long-lived keys in terraform.tfvars are the anti-pattern you'll encounter and need to push back against in real work. Module 7 covers this end-to-end.

Break it on purpose

State and reality can drift. See it happen.

With the cluster up, delete it out-of-band:
```
kind delete cluster --name tf-demo
```
Predict: what will terraform plan say next? Will it (a) notice the cluster is gone and plan to recreate, (b) error out, or (c) silently continue believing the cluster exists? Write down your guess.
Run terraform plan. Read the output.
Run terraform apply -auto-approve. Observe.
Now try the reverse: with the cluster up, edit variables.tf to change cluster_name to tf-demo-v2. terraform plan. What does Terraform want to do now, and what does that teach you about resource identity vs configuration?

The class of failure: state drift. State is Terraform's belief about the world. The world can change without telling it. Anyone who has run terraform apply in anger on a real team has hit this, usually at 11pm during an incident. The skill in the next lesson will need to know how to recover.

Acceptance test

kubectl --kubeconfig ./kubeconfig get nodes

Expected:

NAME                    STATUS   ROLES           AGE   VERSION
tf-demo-control-plane   Ready    control-plane   42s   v1.30.0

Then tear down:

terraform destroy -auto-approve
kind get clusters

tf-demo should not be in the list.

What to keep for the next lesson

Keep the tf-kind/ directory, the Claude transcript, and your Break it on purpose notes on state drift. In the next lesson you'll codify .claude/skills/terraform-kind-provision/ and explicitly teach it what to do when state believes a cluster exists but reality disagrees — otherwise a fresh Claude session will dutifully re-apply against stale assumptions.