Glossary

The vocabulary of the course. Terms auto-link from lessons on first occurrence; hover for a definition.

A B C D E F G H I J K L M N O P R S T W Y

A

Agent(AI agent): An LLM that directs its own process and tool use, with some level of autonomy. Distinct from a one-shot LLM call or a hard-coded workflow.; See also: Augmented LLM, Workflow, Tool Use
Agent Canvas(Agent Design Canvas): An 8-cell worksheet for framing an agent before you build: Purpose, Triggers, Tools, Knowledge, Authority, Guardrails, Human Checkpoints, Success Metrics.; See also: REMIT
Agentic AI: AI systems that take actions in the world on behalf of a user — not merely answer questions. Implies tools, persistent memory, and some autonomy.; See also: Agent, Tool Use
ASL(AI Safety Level): Anthropic's graduated capability thresholds in the Responsible Scaling Policy. ASL-1 to ASL-4+ trigger increasingly strict deployment and security standards.; See also: RSP, Capability Threshold
Approval Fatigue: When a human reviewer rubber-stamps agent actions because there are too many to review meaningfully. Oversight in form, not substance.; See also: HITL, REMIT
Augmented LLM: A single LLM call enhanced with retrieval or one-shot tool use. No loop, no autonomy.; See also: Agent, RAG
Authority Level: How much autonomy the agent has been granted. Five tiers: Observe, Recommend, Act-with-Approval, Act-and-Report, Autonomous. Autonomy is earned, not granted.; See also: Trust

B

Blast Radius: The scope of harm if an agent acts wrongly. Financial impact × data sensitivity × reversibility. Higher blast radius demands lower default autonomy.; See also: REMIT, Envelope

C

Capability Threshold: In Anthropic's RSP, the point at which a new model's capabilities trigger a higher ASL tier with stricter safeguards.; See also: RSP, ASL
Circuit Breaker: An automatic hard stop — e.g., if an agent exceeds a spend limit, makes too many tool calls, or triggers an anomaly detector.; See also: Monitoring, REMIT
Context Window: The maximum number of tokens a model can consider in a single generation. Larger windows reduce the need for RAG but increase cost per call.; See also: RAG

D

Drift Detection: Monitoring for changes in an agent's inputs, outputs, or behaviour over time that indicate it has deviated from expected performance.; See also: Monitoring, REMIT

E

Envelope: The hard boundaries of an agent's authority — encoded in code, not policy. Covers tools, actions, spend, and scope. The E in REMIT.; See also: REMIT, Guardrail
Escalation Path: A documented route from agent failure to a human decision-maker. Must exist before deployment.; See also: HITL, REMIT
EU AI Act: The European Union's horizontal regulation on AI. Risk-tiered (prohibited, high, limited, minimal). GPAI obligations live since 2025-08-02; Commission enforcement powers from 2026-08-02; legacy models by 2027-08-02.; See also: GPAI, NIST AI RMF

F

Fine-Tuning(Fine-tune): Further training a pretrained model on a narrower dataset to specialise its behaviour, style, or domain knowledge. Cheaper than training from scratch, but still changes weights — distinct from prompt engineering, RAG, or few-shot prompting, which leave the model untouched.; See also: LLM, RAG, Model Provenance
Fluid Compute: Vercel's current default compute model — full Node.js runtime with function instance reuse for lower cold starts than traditional one-request-per-instance serverless.

G

Golden Dataset: A small, hand-curated, high-quality evaluation set representing your most important user interactions. Expected tools, outputs, and trajectories are pre-specified.; See also: Trajectory Evaluation, Silver to Gold
GPAI(General-Purpose AI): Under the EU AI Act, AI models of broad capability. Providers have specific transparency, copyright, and safety obligations.; See also: EU AI Act
Groundedness: How well an agent's answers are supported by the sources it cites or the data it was given. Ungrounded answers are hallucinations dressed up as fact.; See also: RAG
Guardrail: A constraint on agent behaviour. Hard (code-enforced) guardrails beat soft (prompt-based) ones — 34% of safety incidents come from soft-only enforcement.; See also: Envelope, REMIT

H

HITL(Human-in-the-Loop): The human approves each agent action before it executes. One of three oversight models — see HOTL and HOOTL.; See also: HOTL, HOOTL, Authority Level
HOTL(Human-on-the-Loop): The agent acts autonomously; the human monitors in real time and can intervene. Used for time-sensitive but reviewable actions.; See also: HITL, HOOTL
HOOTL(Human-out-of-the-Loop): The agent acts autonomously; humans only audit via dashboards and logs after the fact. Highest-autonomy oversight model.; See also: HITL, HOTL

I

Identity: Verified provenance for an agent — model, version, provider, capabilities, authorisation, and lineage. The I in REMIT. Non-human identities now outnumber humans 45-to-1 in financial services.; See also: REMIT, Provenance

J

Job-To-Be-Done(JTBD): A framing for agent purpose: "As a [user], I need to [job], so that [outcome]." Force yourself to say what the agent is explicitly NOT for.; See also: Agent Canvas
JSON(JavaScript Object Notation): A lightweight, text-based data format of nested objects, arrays, strings, numbers, booleans, and null. The default wire format for tool calls and structured LLM output — strict and machine-parseable, but noisier for humans than YAML.; See also: YAML, Tool Use

K

Kill Switch: A control that halts an agent's execution from outside the agent's runtime. Must live outside the agent — children don't stop when the parent dies.; See also: Circuit Breaker, REMIT

L

LLM(Large Language Model): A neural network trained on large text corpora to predict the next token. The stochastic reasoning engine at the heart of every agent — useful, non-deterministic, and bounded by its context window.; See also: Context Window, Agent, Augmented LLM
LLM-as-Judge: Using a large language model to score outputs from another model (or from the same model) against a rubric. Effective when calibrated against human graders; beware position, length, and verbosity bias.; See also: Rubric
Long-Tail Opportunity: The large set of low-individual-value but high-aggregate-value use cases for agents — tasks too niche for dedicated software but too numerous to ignore.

M

Markdown(MD, MDX): A lightweight text format using punctuation for structure (headings, lists, links, code blocks). Preferred by LLMs for both input and output because it's token-efficient, readable, and easy to generate. MDX extends it with embedded components and is used by tools like Claude to show complex visuals in text (we use this in our courses to include the interactive elements); See also: Skill, System Prompt
Model Context Protocol(MCP): An open standard (Anthropic, 2024) that lets AI models discover and call external tools and data sources uniformly. Think USB-C for AI — one protocol, many connections.; See also: Tool Use, Skill
Memory: What an agent remembers. Short-term (current conversation), working (active task state), long-term (persistent across sessions). Each tier raises different governance questions.; See also: RAG
Model Provenance: Traceability of a model's origin, training data, fine-tunes, and supply chain. Highlighted in the NIST AI RMF GenAI Profile.; See also: NIST AI RMF, Identity
Monitoring: Continuous observability of agent behaviour, not quarterly audits. The M in REMIT. Covers action logs, reasoning traces, drift detection, alerts, and approval-fatigue watch.; See also: REMIT, Drift Detection

N

NIST AI RMF: US National Institute of Standards and Technology's Artificial Intelligence Risk Management Framework (1.0, 2023). Voluntary, four functions: Govern, Map, Measure, Manage.; See also: NIST-AI-600-1
NIST-AI-600-1: The Generative AI Profile of the NIST AI RMF (July 2024). Lists 12 GenAI-specific risks and 200+ actions organisations can take.; See also: NIST AI RMF

O

OWASP LLM Top 10: OWASP's taxonomy of the top risks in LLM-powered applications. LLM01 is Prompt Injection, typically the most-cited.; See also: Prompt Injection, Red-Teaming

P

Prompt Injection: An attack where untrusted input manipulates an LLM's instructions. Direct (user types malicious prompt) or indirect (malicious content in a document the agent reads). OWASP LLM01.; See also: Red-Teaming, OWASP LLM Top 10
Provenance: Knowing where something came from — for models, data, citations, or tool results. A precondition for auditability.; See also: Identity, Monitoring
Purpose: What the agent is hired to do, and — equally important — what it is explicitly NOT for. Scope creep kills agent projects. Cell 1 of the Agent Canvas.; See also: Agent Canvas, Job-To-Be-Done

R

RAG(Retrieval-Augmented Generation): Fetching relevant documents at query time and injecting them into the prompt so the model can ground its answer in current, private, or large-corpus data.; See also: Groundedness, Context Window
Red-Teaming: Adversarial testing — deliberately trying to break an agent with prompt injection, tool misuse, jailbreaks, or boundary-probing inputs.; See also: Prompt Injection, OWASP LLM Top 10
REMIT: A five-pillar governance framework for agents: Responsibility (named human owner), Envelope (bounded authority), Monitoring (continuous observability), Identity (verified provenance), Trust (graduated autonomy).; See also: Agent Canvas
Responsibility: Named human ownership for an agent. One person, not a team, not a committee. The R in REMIT. No orphans — if the owner leaves, responsibility transfers explicitly.; See also: REMIT
RSP(Responsible Scaling Policy): Anthropic's policy for graduated safety commitments as models become more capable. Version 3.0 took effect 2026-02-24 and introduced Frontier Safety Roadmaps and Risk Reports.; See also: ASL
Rubric: A structured set of criteria used to grade an output. Good rubrics are specific, comparable, and calibrated against human judgement.; See also: LLM-as-Judge

S

Scope Creep: Incrementally expanding what an agent is asked to do beyond its original purpose. A leading cause of failure. Mitigated by writing the "NOT for" list up front.; See also: Purpose
Silver to Gold: A pattern for growing eval datasets: capture silver (synthetic or production-sampled) cases, promote to gold (hand-curated, trusted) through human review.; See also: Golden Dataset
Skill: A reusable package of instructions — usually markdown or YAML — injected into an agent's context to teach it how to perform a specific job well. Skills shape how the agent reasons; tools give it things to do.; See also: Tool Use, MCP
System Prompt: The persistent instruction that shapes every decision an agent makes. Six ingredients: Role & Identity, Goal & Scope, Tools & Instructions, Rules & Boundaries, Tone & Format, Escalation Paths.; See also: Agent Canvas

T

Tool Use: An LLM calling external functions to query data, take actions, or run code. Deterministic and auditable — unlike the model's reasoning. Risk-rate every tool: read-only, write/modify, irreversible.; See also: MCP
Trajectory Evaluation: Evaluating the sequence of tool calls an agent makes (the "glass-box" view), not just the final answer. Catches agents that reach the right answer via wrong routes.; See also: Golden Dataset
Trust: In REMIT, graduated autonomy earned through evidence. Intern → Junior → Senior → Principal. Demotion is always available. Trust is calibrated, not declared.; See also: REMIT, Authority Level

W

Workflow: Multiple LLMs (or a single LLM plus code) in a predefined path. Deterministic, auditable, predictable — but less adaptive than a full agent.; See also: Agent

Y

YAML(YAML Ain't Markup Language): An indentation-based, human-readable data format. It's a bit like JSON but with more expressive power and is friendlier for hand-editing. Common in skill frontmatters, agent configs, and infrastructure files.; See also: JSON, Skill