Lesson 15 of 28
Module 4 · Skill — Codify `terraform-kind-provision`
Why this becomes a skill
You wrote Terraform config that provisions a kind cluster, exposed a kubeconfig as an output file, and — crucially — watched what happens when state and reality drift. That drift is the single most dangerous class of Terraform failure in real work, and you just saw it first-hand. A skill that encodes "do this cleanly and recover when state lies" is worth far more than one that only handles the happy path.
The skill inherits your understanding of why Terraform plans look the way they do, not just what to type. That's the difference between a skill that works and a skill that breaks the first time someone runs it against a partially-provisioned cluster.
Codify
Send Claude:
"Codify the session we just had as a skill at
.claude/skills/terraform-kind-provision/SKILL.md. Scope: provision a single localkindcluster with Terraform, output a kubeconfig file. Parameterise: cluster name, node image, kind control-plane port mappings (list of {container_port, host_port} pairs), output directory for kubeconfig. Steps: scaffoldversions.tfwithtehcyx/kindandhashicorp/localproviders,variables.tf,main.tf,outputs.tf, runterraform init→plan→apply -auto-approve, thenkubectl --kubeconfig ./kubeconfig get nodesacceptance. Keep under 140 lines."
Read the SKILL.md. If the required provider versions are pinned too loosely (latest) or too tightly (= 0.4.1), ask Claude to explain the choice and adjust. Pinning is a trade-off — loose pins break over time, tight pins break during routine maintenance.
Refine
"Add a state-sanity step after
terraform apply: parseterraform state list, verify bothkind_cluster.mainandlocal_file.kubeconfigexist, andkubectl --kubeconfig <outdir>/kubeconfig get nodesreturns at least one Ready node. Fail loudly with a diagnostic if any check fails.""Remember the Break it on purpose probe: we deleted the cluster out-of-band and Terraform's state kept claiming it existed. Add a
Known failuressection with: (a) state drift when a cluster is deleted externally — surface the fix:terraform applywill re-create (fine) but if the user wants to remove stale state,terraform state rm kind_cluster.main. (b) name collision —kind get clustersalready lists the name, fix: either rename via variable orkind delete cluster --name <old>first. (c) provider version mismatch on state re-import — fix:terraform init -upgrade+ carefully re-plan.""Add a pre-apply
terraform plan -out=tfplanstep and show the output to the user before applying. This enforces the 'read the plan before apply' habit the module taught."
Validate in a fresh context — happy path AND adversarial
4a — Happy path
New Claude session:
"Read
.claude/skills/terraform-kind-provision/SKILL.mdand provision a cluster namedvalidate-happywith node imagekindest/node:v1.30.0, one port mapping (container_port 30888, host_port 8888). Output kubeconfig to./kubeconfig-happy. Confirmkubectl --kubeconfig ./kubeconfig-happy get nodesshows one Ready node."
If the skill one-shots it: pass. Tear down: terraform destroy -auto-approve.
4b — Adversarial
- Existing cluster collision: before running the skill, manually
kind create cluster --name validate-happy. Ask the skill to provision a cluster with the same name. Does it pre-check withkind get clustersand fail with "cluster already exists — delete it or rename", or does it proceed and produce a confusing Terraform error partway through? - State drift: after a successful apply, manually
kind delete cluster --name validate-happy. Ask the skill to verify the cluster is healthy. Does it run the state-sanity step (section 3 refinement) and notice the node list is empty, or does it trust state blindly? - Missing provider: run the skill without network access (pull the ethernet cable or set HTTP proxy to
localhost:1).terraform initwill fail to download thetehcyx/kindprovider. Does the skill surface this with a clear "provider download failed; check network / mirror configuration" message, or does it emit Terraform's raw multi-hundred-line error?
Promote deterministic commands to scripts/
The init → plan → apply sequence is deterministic. The sanity check is deterministic. Both belong in a script:
"Extract to
.claude/skills/terraform-kind-provision/scripts/provision.sh(args: cluster_name, node_image, outdir). Run:terraform -chdir=<workspace> init,terraform -chdir=<workspace> plan -var cluster_name=... -var node_image=... -out=tfplan, show the plan, require user confirmation unless--yesis passed,terraform -chdir=<workspace> apply tfplan, then the state-sanity block.set -euo pipefail. Exit non-zero with a distinct code per failure class (1 = plan failed, 2 = apply failed, 3 = sanity check failed)."
Distinct exit codes per class are small cost / big payoff for a skill that gets called from CI.
Know the boundary
At the top of .claude/skills/terraform-kind-provision/SKILL.md:
- This skill handles: local
kind-based Kubernetes provisioning via Terraform, kubeconfig extraction to a local file, pre-apply plan display, post-apply state sanity, and the state-drift recovery pattern. - This skill does NOT handle: cloud provisioning (GKE/EKS/AKS — these need cloud credentials, IAM roles, VPC networking, and should use a different skill; see module 7 for IAM), remote state backends (S3/GCS with DynamoDB/GCS locking), multi-environment workspaces, module composition,
importof existing resources, destroy-protect guardrails, or production-grade credential handling (no credential file is written to disk except the kubeconfig).
You're done when
A fresh Claude session provisions a cluster from one prompt, your adversarial probes (name collision, state drift, missing provider) each produce clear actionable messages, and the distinct exit codes from provision.sh correctly signal the failure class.