Cloud adapter — Anthropic Claude API — Kubernetes Operators, Istio, Incident Response & AI

You have the Client interface from lesson 08 and a working local adapter from lesson 09. This lesson implements the second real adapter — same contract, completely different backend. If the contract was designed well, the adapter lands in the same shape as the Ollama one. The error-mapping table grows, the auth model is real, and the adversarial probe shifts from "daemon dead" to "key bogus."

Why a cloud adapter alongside local

The local adapter is the right default for cost, data control, and free integration tests. The cloud adapter earns its own place in the wiring for four specific reasons:

Larger context windows. Claude Sonnet handles 200K tokens; Opus extends to 1M. Local 2B/4B/9B models top out at 8K–32K. When the reconciler needs to summarise a large log bundle or reason over a whole namespace's state, the cloud adapter is the only one that fits the prompt.
Better instruction-following. A small local model will, more often than you'd like, drift away from your structured-output requirements. Sonnet stays on-task even with long, multi-clause system prompts. For production reconciliation decisions where a wrong answer has real consequences, that reliability is the feature you're paying for.
Wall-clock speed. Sonnet at a typical reconcile prompt size returns in 1–3 seconds. Gemma 2B on a laptop is 5–15 seconds for the same length. Inside a reconcile-loop deadline budget, that's the difference between "fits comfortably" and "barely fits."
Tool use, if you grow into it later. This adapter doesn't expose tool calling — that's outside the contract — but the Anthropic SDK supports it, so a future contract upgrade (or a sibling interface) has a clean implementation path.

Cost callout. Running this lesson's tests against the real API costs roughly $0.01 of credits at current Sonnet rates. Not enough to think about; not zero. If you don't want to spend it, you can complete the lesson against a mock — but you'll miss the bad-key probe, which is the editorial point. The contract is useful exactly because real failures get classified correctly, and a mock can't simulate the actual 401 response shape.

Get an API key and install the SDK

# console.anthropic.com → Settings → API Keys → Create Key
export ANTHROPIC_API_KEY="sk-ant-..."

cd ~/operator-llm
go get github.com/anthropics/anthropic-sdk-go

The SDK is the official Go client maintained by Anthropic. It's a recent project and its API surface has been evolving — when you go get it, pin the version that lands in go.mod so the adapter is reproducible:

go mod tidy
grep anthropic-sdk-go go.mod   # note the version

Future course updates will track newer SDK versions; for now, whatever version you pinned is what the lesson refers to. If the SDK ships a breaking change, the adapter changes — the Client interface doesn't. That's the architectural point.

Security note that you'll need in lesson 12. Never hardcode the API key. The reconciler will read it from an environment variable that's populated from a Kubernetes Secret mounted into the operator's pod. The same env-var-reading pattern in this lesson's New(...) constructor is what the operator's wiring layer will use — no special-case code, just the same os.Getenv("ANTHROPIC_API_KEY") you write here. Defence in depth: the Secret should be sealed/external-secrets-backed in production, but that's an operator-deployment concern, not an adapter one.

The Messages API basics

The single call you care about is Messages.New:

import anthropic "github.com/anthropics/anthropic-sdk-go"

client := anthropic.NewClient() // reads ANTHROPIC_API_KEY by default

resp, err := client.Messages.New(ctx, anthropic.MessageNewParams{
    Model:     anthropic.F(anthropic.ModelClaudeSonnet4_6),
    MaxTokens: anthropic.F[int64](1024),
    System:    anthropic.F("You are a concise SRE assistant."),
    Messages: anthropic.F([]anthropic.MessageParam{
        anthropic.NewUserMessage(anthropic.NewTextBlock("Say only the word OK.")),
    }),
})

Model recommendation. Three Claude tiers worth knowing:

Model	When to pick it
`claude-haiku-4-5-20251001`	The cheapest path. Fine for low-stakes classification or summarisation inside the reconcile loop.
`claude-sonnet-4-6`	The default operator workhorse. Best cost/quality balance for incident-response reasoning.
`claude-opus-4-7`	Highest reasoning quality for the rare hard call. Reserve for paths where a wrong answer is materially costly.

The model choice belongs in Adapter.Model (default claude-sonnet-4-6) and gets overridden per-call by the operator's wiring if you need tier routing per CR or per namespace.

Token usage comes back on resp.Usage (InputTokens, OutputTokens). Latency is your own time.Since(start) — Anthropic's response doesn't carry a server-side timing field, and even if it did, you want client-observed wall-clock for the operator's metrics.

Claude-guided task — implement the adapter

Open Claude Code in the operator-llm repo. Same discipline as before: read every file before running anything; ask Claude to justify each choice before accepting it.

What to ask Claude for, in order:

Create pkg/llm/anthropic/adapter.go with an Adapter struct holding Client *anthropic.Client, Model string (default claude-sonnet-4-6), and MaxTokensCap int64 (default 4096). The constructor New(opts Options) *Adapter reads ANTHROPIC_API_KEY from env when no explicit client is passed; falls back to ANTHROPIC_MODEL env var when Model is empty.
Implement Ask(ctx, req) (Response, error). Translate req.SystemHint → System, req.Prompt → a single user message, req.MaxTokens (capped to MaxTokensCap) → MaxTokens. Time the call with time.Now() ↔ time.Since(start). Map resp.Usage into the typed response with Provider: "anthropic".

Map errors into ProviderError using this table:

Anthropic signal	→ `ProviderError`
HTTP 401 (`authentication_error`)	`Code: "auth", Retryable: false`
HTTP 403 (`permission_error`)	`Code: "auth", Retryable: false`
HTTP 429 (`rate_limit_error`)	`Code: "rate_limit", Retryable: true`
HTTP 500 / 502 / 503	`Code: "unavailable", Retryable: true`
HTTP 529 (`overloaded_error`)	`Code: "unavailable", Retryable: true`
HTTP 400 (`invalid_request_error`)	`Code: "bad_request", Retryable: false`
HTTP 404 (`not_found_error` — model name typo)	`Code: "not_found", Retryable: false`
`ctx.Err() == context.DeadlineExceeded`	`Code: "timeout", Retryable: true`
`ctx.Err() == context.Canceled`	return `context.Canceled` directly

The SDK's *anthropic.Error exposes the HTTP status and the API error type — use both for classification. Don't infer from the message string; the status code is the contract.

Wire the live test in pkg/llm/anthropic/adapter_test.go. Skip if ANTHROPIC_API_KEY isn't set so CI doesn't need a key:

func TestAdapter_Live(t *testing.T) {
    if os.Getenv("ANTHROPIC_API_KEY") == "" {
        t.Skip("ANTHROPIC_API_KEY not set")
    }
    a := anthropic_adapter.New(anthropic_adapter.Options{})
    resp, err := a.Ask(context.Background(), llm.Request{
        Prompt:    "Reply with exactly: OK",
        MaxTokens: 8,
    })
    if err != nil { t.Fatal(err) }
    if resp.Text == "" { t.Fatal("empty response") }
    if resp.InputTokens == 0 || resp.OutputTokens == 0 {
        t.Fatal("missing token counts")
    }
}

Two questions worth asking Claude before you accept its draft:

"Why cap MaxTokens in the adapter when the SDK already enforces per-model limits?" The right answer mentions runaway cost — a buggy caller passing MaxTokens: 100000 shouldn't be allowed to burn through your budget. The adapter is the right place to enforce that because every caller routes through it.
"The classifier returns context.Canceled directly, but context.DeadlineExceeded becomes a ProviderError{Code: "timeout"}. Why the asymmetry?" Same reason as in the Ollama adapter: cancellation is the caller's decision (don't retry); deadline-exceeded is a transient failure (probably retry).

Adversarial probe — bad API key (and what the retry policy must NOT do)

This is the lesson's editorial point: the contract is most valuable when it stops the caller from doing the wrong thing on failure.

func TestAdapter_BadKey(t *testing.T) {
    t.Setenv("ANTHROPIC_API_KEY", "sk-ant-bogus-key-do-not-retry")
    a := anthropic_adapter.New(anthropic_adapter.Options{})

    ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
    defer cancel()
    _, err := a.Ask(ctx, llm.Request{Prompt: "anything", MaxTokens: 8})

    var pe *llm.ProviderError
    if !errors.As(err, &pe) {
        t.Fatalf("want *ProviderError, got %T: %v", err, err)
    }
    if pe.Code != "auth" || pe.Retryable {
        t.Fatalf("want auth+non-retryable, got %s retryable=%v", pe.Code, pe.Retryable)
    }

    // The retry policy from lesson 08 MUST honour Retryable: false.
    attempts := 0
    retrying := retry.New(retry.Policy{
        Inner: instrumentedAdapter{a, &attempts},
        Max:   5,
    })
    _, _ = retrying.Ask(ctx, llm.Request{Prompt: "anything", MaxTokens: 8})
    if attempts != 1 {
        t.Fatalf("retry policy retried on non-retryable auth error: %d attempts", attempts)
    }
}

Run it. Two assertions matter:

The first call returns a classified ProviderError{Code: "auth", Retryable: false}. If it returns a raw SDK error, the classifier is incomplete.
The retry policy makes exactly one attempt. If it retries on auth failure, you're burning rate-limit budget for a request that will never succeed — every retry costs you in two ways (cost, and crowding out legitimate retries against the same key on adjacent calls).

Secondary probe — truncation, not error. Set MaxTokens: 5 and ask the model a question that needs a longer answer. The call must succeed — return a truncated Response.Text ending mid-thought, but no error. Distinguishing "model finished early because we capped it" from "model errored" is exactly the contract's job; an adapter that returns an error here would force every caller to special-case MaxTokens reasoning, which defeats the point of the abstraction.

func TestAdapter_Truncated(t *testing.T) {
    if os.Getenv("ANTHROPIC_API_KEY") == "" {
        t.Skip("ANTHROPIC_API_KEY not set")
    }
    a := anthropic_adapter.New(anthropic_adapter.Options{})
    resp, err := a.Ask(context.Background(), llm.Request{
        Prompt:    "Count from 1 to 100, comma-separated.",
        MaxTokens: 5,
    })
    if err != nil { t.Fatalf("truncation is not an error: %v", err) }
    if resp.Text == "" { t.Fatal("want truncated text, got empty") }
}

Codify as a skill

.claude/skills/anthropic-claude-adapter/SKILL.md. Have Claude draft it, then critique and tighten.

Capture:

The Adapter's config fields and env-var fallbacks (ANTHROPIC_API_KEY, ANTHROPIC_MODEL).
The model-tier guidance (Haiku / Sonnet / Opus) — when to pick each.
The full error-mapping table, verbatim. It is the adapter's contract.
The retry-policy interaction: classifier produces Retryable, policy honours it. The bad-key probe is the proof; reproduce its structure in the skill.
The MaxTokensCap and why it lives in the adapter, not the caller.

End with the mandatory boundary statement:

This skill handles: single-shot completions against the Anthropic Messages API for the Claude model family (Haiku/Sonnet/Opus), with classified errors, retry-policy guidance, MaxTokens capping, and ctx-respecting cancellation.

This skill does NOT handle: streaming responses, tool use, multi-turn conversation, file/image inputs, prompt caching, the Batch API, or the Files API. Each of those is a separate concern with its own contract — adding any of them to this adapter is the wrong move.

Validate fresh: new Claude Code session, hand it the skill, ask it to "add a check that errors if Request.MaxTokens > 4096 on Sonnet." Two things to watch for:

Does it edit the existing MaxTokensCap field, or does it add a parallel MaxTokensLimit that quietly duplicates the cap? The skill should make the existing field obvious.
Does it propose silently clamping (existing behaviour) or erroring? The skill should be explicit that clamping is the default behaviour — adding a "strict" mode is a deliberate API expansion, not an obvious extension.

Then run the adversarial validation: ask the same fresh session to "add streaming support so the operator can show progress." The skill must refuse — streaming needs a different return shape, which means a different contract, which means a separate adapter. A skill that quietly extends scope to "look helpful" is exactly the trap the boundary statement exists to prevent.

Promote deterministic commands to `scripts/`

scripts/anthropic-smoke.sh — curl https://api.anthropic.com/v1/messages -H "x-api-key: $ANTHROPIC_API_KEY" -H "anthropic-version: 2023-06-01" -d '{...}' for a 5-token reply. Confirms the key is valid before you spend test time chasing local bugs.
scripts/test.sh — already exists from lesson 08; now extend to go test ./pkg/llm/... with an env-controlled flag for the cost-incurring live tests.

Acceptance test

# Local + free
go test ./pkg/llm/anthropic/... -run BadKey
go test ./pkg/llm/anthropic/... -run Truncated   # skipped unless key is set
# Costs ~$0.01
ANTHROPIC_API_KEY=$REAL_KEY go test ./pkg/llm/anthropic/...

All green. The bad-key probe asserts ProviderError{Code: "auth", Retryable: false} and exactly one retry-policy attempt. The live test asserts non-empty Response.Text plus non-zero input/output token counts.

Closing — the architectural payoff

You now have one interface, one Fake, and two real adapters. Wire any of the three into a Go service and the calling code is identical — c.Ask(ctx, req), classify the error, retry-or-not on Retryable, account for tokens. The operator's reconcile loop (which we'll build next module) won't even know which adapter it's talking to, and that's the point.

What you've actually built in three lessons is the kind of internal package mature teams maintain in production — interface, deterministic stub, two implementations, classified errors, ctx-respecting cancellation, token accounting. And you've built three skill files that codify how to extend it without breaking the contract: the kind of artefact teams should maintain but mostly don't, because writing skills feels like overhead until your next teammate (or your future self) has to extend the code without the context you have now.

The next module wires this library into a Kubernetes operator's reconcile loop and uses it to triage incidents on the Bookinfo lab you stood up in lessons 04–07. The interface stays exactly as you designed it — that's how you know the design was right.

Cloud adapter — Anthropic Claude API