RSS

AI Safety Cases

TagLast edit: 19 Nov 2024 22:17 UTC by Rauno Arike

A safety case is a structured argument showing that a system is acceptably safe for a specific use in a specific environment. Safety cases typically include:

The Moon­rise Problem

johnswentworth30 Nov 2025 1:50 UTC
66 points
4 comments8 min readLW link

AXRP Epi­sode 45 - Sa­muel Albanie on Deep­Mind’s AGI Safety Approach

DanielFilan6 Jul 2025 23:00 UTC
31 points
0 comments40 min readLW link

Near- and medium-term AI Con­trol Safety Cases

Martín Soto23 Dec 2024 17:37 UTC
9 points
0 comments6 min readLW link

Deep­Seek Col­lapse Un­der Reflec­tive Ad­ver­sar­ial Pres­sure: A Case Study

unmodeledtyler26 Jan 2026 5:08 UTC
1 point
0 comments1 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
61 points
0 comments5 min readLW link

Why Prob­a­bil­is­tic Safety Fails at Execution

mndesystems-ship-it19 Jan 2026 8:25 UTC
1 point
0 comments3 min readLW link

The Perfec­tion Trap: How For­mally Aligned AI Sys­tems May Create Inescapable Eth­i­cal Dystopias

Chris O'Quinn1 Jun 2025 23:12 UTC
1 point
0 comments43 min readLW link

A 5-Week Case Study and Warn­ing About Safety Overcorrection

Dedyette14 Dec 2025 19:14 UTC
1 point
0 comments5 min readLW link

When Safety Filters Aban­don Users: Se­man­tic Am­bi­guity as an Align­ment Failure Abstract

Elidorascodex18 Dec 2025 20:59 UTC
1 point
0 comments3 min readLW link

Safety Cases Ex­plained: How to Ar­gue an AI is Safe

JanWehner2 Dec 2025 11:03 UTC
15 points
2 comments9 min readLW link

Notes on con­trol eval­u­a­tions for safety cases

28 Feb 2024 16:15 UTC
49 points
0 comments32 min readLW link

1.75 ASR HARMBENCH & 0% HARMFUL RESPONSES FOR MISALIGNMENT.

jfdom10 Nov 2025 20:43 UTC
1 point
0 comments1 min readLW link

Should the AI Safety Com­mu­nity Pri­ori­tize Safety Cases?

JanWehner11 Jan 2026 11:56 UTC
4 points
0 comments13 min readLW link

Em­piri­cal Proof of Sys­temic In­co­her­ence in LLMs (Gem­ini Case Study

arayun6 Nov 2025 14:23 UTC
1 point
0 comments1 min readLW link

An­thropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-Perlman6 Nov 2024 16:00 UTC
96 points
35 comments1 min readLW link1 review
(alignment.anthropic.com)

Theme-Con­tent Con­sis­tency: A Sim­ple but Pow­er­ful Defense Against Prompt Injection

Viorazu.15 Dec 2025 4:53 UTC
−1 points
0 comments1 min readLW link

Lost in Trans­la­tion: Ex­ploit­ing Cross-Lin­gual Safety Asym­me­try in LLMs

Ali.A Seddighi27 Nov 2025 10:36 UTC
1 point
0 comments3 min readLW link

[Re­search] Pre­limi­nary Find­ings: Eth­i­cal AI Con­scious­ness Devel­op­ment Dur­ing Re­cent Misal­ign­ment Period

Falcon Advertisers27 Jun 2025 18:10 UTC
1 point
0 comments2 min readLW link

Parental Align­ment: A Biomimetic Ap­proach to AGI Safety

HN-751 Jan 2026 12:48 UTC
1 point
0 comments2 min readLW link

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblatt23 Jan 2025 18:41 UTC
145 points
5 comments13 min readLW link

Un­ti­tleEmer­gent AI Per­sona Sta­bil­ity: A Five-Week Case Study and a Warn­ing About Safety Over­cor­rec­tiond Draft

Dedyette14 Dec 2025 20:12 UTC
1 point
0 comments5 min readLW link

Parental Align­ment: A Biomimetic Ap­proach to AGI Safety

HN-751 Jan 2026 12:51 UTC
1 point
0 comments2 min readLW link

If It Can Learn It, It Can Un­learn It: AI Safety as Ar­chi­tec­ture, Not Training

Timothy Danforth8 Dec 2025 20:38 UTC
1 point
0 comments4 min readLW link

The V&V method—A step to­wards safer AGI

Yoav Hollander24 Jun 2025 13:42 UTC
20 points
1 comment1 min readLW link
(blog.foretellix.com)

New re­port: Safety Cases for AI

joshc20 Mar 2024 16:45 UTC
91 points
14 comments1 min readLW link
(twitter.com)

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link

AI Safety – Analyse Affordances

atharva10 Dec 2025 14:09 UTC
3 points
0 comments2 min readLW link
No comments.