AI Safety Cases

TagLast edit: 19 Nov 2024 22:17 UTC by Rauno Arike

A safety case is a structured argument showing that a system is acceptably safe for a specific use in a specific environment. Safety cases typically include:

A description of the system’s operational context
Identification of potential hazards and their consequences
A description of the risk controls that mitigate the hazards
An account of any residual risk

AXRP Episode 45 - Samuel Albanie on DeepMind’s AGI Safety Approach

DanielFilan6 Jul 2025 23:00 UTC

31 points

0 comments40 min readLW link

Near- and medium-term AI Control Safety Cases

Martín Soto23 Dec 2024 17:37 UTC

9 points

0 comments6 min readLW link

DeepSeek Collapse Under Reflective Adversarial Pressure: A Case Study

unmodeled.tyler8 Jul 2025 9:14 UTC

1 point

0 comments1 min readLW link

A sketch of an AI control safety case

Tomek Korbak, joshc, Benjamin Hilton, Buck and Geoffrey Irving

30 Jan 2025 17:28 UTC

61 points

0 comments5 min readLW link

The Perfection Trap: How Formally Aligned AI Systems May Create Inescapable Ethical Dystopias

Chris O'Quinn1 Jun 2025 23:12 UTC

1 point

0 comments43 min readLW link

Notes on control evaluations for safety cases

ryan_greenblatt, Buck and Fabien Roger

28 Feb 2024 16:15 UTC

49 points

0 comments32 min readLW link

1.75 ASR HARMBENCH & 0% HARMFUL RESPONSES FOR MISALIGNMENT.

jfdom10 Nov 2025 20:43 UTC

1 point

0 comments1 min readLW link

Empirical Proof of Systemic Incoherence in LLMs (Gemini Case Study

arayun6 Nov 2025 14:23 UTC

1 point

0 comments1 min readLW link

Anthropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-Perlman6 Nov 2024 16:00 UTC

95 points

33 comments1 min readLW link

(alignment.anthropic.com)

Lost in Translation: Exploiting Cross-Lingual Safety Asymmetry in LLMs

Ali.A Seddighi27 Nov 2025 10:36 UTC

1 point

0 comments3 min readLW link

[Research] Preliminary Findings: Ethical AI Consciousness Development During Recent Misalignment Period

Falcon Advertisers27 Jun 2025 18:10 UTC

1 point

0 comments2 min readLW link

AI companies are unlikely to make high-assurance safety cases if timelines are short

ryan_greenblatt23 Jan 2025 18:41 UTC

145 points

5 comments13 min readLW link

The V&V method—A step towards safer AGI

Yoav Hollander24 Jun 2025 13:42 UTC

20 points

1 comment1 min readLW link

(blog.foretellix.com)

New report: Safety Cases for AI

joshc20 Mar 2024 16:45 UTC

91 points

14 comments1 min readLW link

(twitter.com)

Toward Safety Cases For AI Scheming

Mikita Balesni and Marius Hobbhahn

31 Oct 2024 17:20 UTC

60 points

1 comment2 min readLW link

No comments.