🏛️ Govern
Best practices checklist
For product and engineering teams.
Best-practice checklist for product & engineering teams
The one-page take-away. Each of these maps back to the pillars and frameworks covered in earlier lessons.
Best-practices checklist
Read-only reference. Copy as markdown for an AI, or open a worksheet to tick off items and track progress.
Before you build
- Agent Canvas filled and reviewed — All eight cells, with an explicit NOT for list. Canvas
- REMIT worksheet completed — Named owner, envelope in code, monitoring plan, identity, trust level. Worksheet
- System Prompt Builder run — Six ingredients in order; no single section longer than ~200 words. Builder
- NIST GenAI Profile mapped — Which of the twelve risks apply, with explicit mitigations for each.
- EU AI Act classification — Documented (prohibited / high / limited / minimal / GPAI; provider vs deployer).
Before you launch
- Golden dataset of 20+ cases — Covering happy path, edge, adversarial, ambiguous, and handoff.
- Five tests passed — Happy, Edge, Adversarial, Ambiguous, Handoff. Five tests
- Red-team suite run — OWASP LLM Top 10 + your domain-specific attacks.
- Circuit breakers wired — Spend caps, tool-call caps, canary failures trigger automatic halt.
- Human oversight model chosen — Based on the risk × complexity matrix, and documented.
- Monitoring dashboard live — Action logs, tool-call regressions, quality regressions, cost, and latency.
- Kill-switch path — Exists outside the agent's runtime, and is known to on-call.
After you launch
- Per-deploy regression run — Full golden dataset executed on every deploy; alert on any regression.
- Daily drift checks — Input distribution, output distribution, tool-call mix.
- Weekly human review — Sampled traces read, patterns noted, rubric updated.
- Authority review cadence — Monthly for new agents; quarterly for mature ones. Evidence-based promotion or demotion.
- Incident response plan — If a circuit breaker fires: who gets paged, what they do, when the board finds out.