Anthropic’s Responsible Scaling Policy
ASL tiers and what v3.0 changed.
Anthropic's Responsible Scaling Policy — a voluntary but influential frame
Anthropic's Responsible Scaling Policy (RSP) is a public commitment to scale AI model development only alongside specific safety and security measures. It is not a regulation — it is a voluntary policy Anthropic holds itself to, published and revised in public.
It is included in this course because: (a) if you use Claude, you are downstream of this framework; (b) the ASL tier structure has become an industry reference; (c) as EU AI Act GPAI systemic-risk obligations bite, something like RSP is what compliance actually looks like in practice.
The current version
Version 3.0 of the RSP took effect 24 February 2026. It is a comprehensive rewrite of v2.0, introducing:
- Frontier Safety Roadmaps — published safety goals ahead of capability development.
- Risk Reports — quantified risk across all deployed models.
- Restructured AI Safety Level Standards (ASL Standards) — graduated sets of safety and security measures that become more stringent as capability rises.
The ASL tiers
Models progress through AI Safety Levels based on capability thresholds. The higher the ASL, the stricter the deployment and security standards.
- ASL-1 — very basic capabilities. Minimal safeguards.
- ASL-2 — reflects current industry best practices. Standard red-teaming, responsible disclosure, usage policies. All currently-deployed Anthropic models operate under ASL-2 standards.
- ASL-3 — models that could meaningfully uplift non-experts in creating CBRN weapons, or that could operate autonomously in some contexts. Activated May 2025 for Anthropic models — this is what triggered the CBRN-protection deployment commitments.
- ASL-4 and beyond — left largely undefined in v2.0 ("we'll specify when we're closer to needing them"). V3.0 restructured ahead of reaching these levels to define more realistic, achievable commitments.
The practical reading: Anthropic operates Claude under ASL-2 with some ASL-3 safeguards for specific capability classes.
What each ASL tier requires (summary)
Without reproducing the full policy, the stepwise escalation roughly looks like:
| Dimension | ASL-2 | ASL-3 | ASL-4+ |
|---|---|---|---|
| Pre-deployment evaluation | Standard red-teaming + bias evals | + Elicitation for uplift risks; Inspect-style evals | + Proof-of-non-uplift; independent audit |
| Deployment commitments | Usage policies; abuse monitoring | + CBRN-targeted mitigations | + Restricted deployment contexts |
| Security posture | Industry-standard security | + Insider-threat mitigations; hardened weight access | + Nation-state-adversary-grade security |
| Disclosure | Model cards, system cards | + Safety case publication | + Independent oversight |
Details are in the RSP document itself.
Why this matters to your team
You do not need to be Anthropic to care about RSP. Three implications for anyone building on top of a frontier model:
- Your supplier's position is load-bearing. An agent built on a model operating at ASL-2 inherits ASL-2-grade assumptions. If your product takes you to a regulatory posture that requires more, you either need a different supplier or additional compensating controls.
- The tier structure is becoming a shared vocabulary. The UK AI Safety Institute, the EU AI Office, and sectoral regulators are all converging on something like ASL-tiered thinking. Adopting the vocabulary makes you easier to audit.
- RSP-like internal policies are becoming table stakes. Even if you are not training models, you are deploying agents. An internal policy that sets capability thresholds and escalating safeguards for your own agent fleet is increasingly expected of mature organisations.