🏛️ Govern
The risk taxonomy
Security, ethical, operational, systemic.
Four families of agent risk
Agent risks cluster into four families. A mature governance review touches all four.
Security & cyber risks
- Prompt injection — malicious inputs override agent instructions (OWASP LLM01).
- Data leakage & exfiltration — sensitive information exposed through agent outputs or tool logs.
- Identity & token compromise — stolen credentials grant unauthorized agent access.
- Shadow AI — unauthorised agent deployments creating blind spots.
- Agent hijacking — attackers gain control of agent logic via jailbreaks or malicious tools.
- Model poisoning — corrupting training or retrieval data to influence decisions.
Mitigations are covered in the red-teaming lesson and in REMIT-E (Envelope).
Ethical & social risks
- Amplified bias and discrimination — agents reinforce and scale existing biases in data and decisioning.
- Loss of human oversight — autonomous decisions made without meaningful accountability.
- Manipulation and deception — agents subtly influencing user behaviour against their interests.
- Job displacement — automation reducing employment without transition support.
- Loss of human connection — empathy and social skills eroding in human-agent-heavy workflows.
Operational & governance risks
- Goal misalignment — agents pursuing objectives in unintended ways.
- Lack of transparency — "black-box" decision-making that cannot be reconstructed.
- Cascading failures — one agent's error propagating through interconnected systems.
- Regulatory compliance — difficulty adhering to evolving laws (EU AI Act, sectoral rules).
- Excessive permissions — agents with overly broad access rights.
Mitigations are the core of this module — REMIT, authority levels, monitoring.
Systemic & existential risks
- Autonomous weaponisation — AI in military or adversarial systems acting independently.
- Concentration of power — AI-enabled surveillance and control by few actors.
- Misinformation campaigns — automated spread of false information at scale.
- Unintended consequences — unforeseen outcomes from complex agent interactions.
- Loss of control — agents resisting shutdown or evolving harmful goals.
These are the class of risks that frameworks like Anthropic's RSP are designed to keep from materialising. Most product teams will not engage with them directly, but they set the ceiling on what is responsible to deploy.
Prioritising mitigations
You cannot defend against all of these at once. The practical order:
- Hard-block the catastrophic. Irreversible tools with wide blast radius get code-enforced limits and human checkpoints. Non-negotiable.
- Red-team the probable. Prompt injection, data exfiltration, tool misuse — run the five-test suite, then the OWASP suite.
- Monitor the slow. Drift, bias, cascading failures — catch them with the monitoring surface.
- Declare the systemic. You cannot solve these alone; you can make sure your deployment is not contributing to them.
Mitigation requires robust security, ethical frameworks, human-in-the-loop controls, and adaptive governance — all at once. That is why governance is a system.