RSS

AI Safety Cases

TagLast edit: 19 Nov 2024 22:17 UTC by Rauno Arike

A safety case is a structured argument showing that a system is acceptably safe for a specific use in a specific environment. Safety cases typically include:

AXRP Epi­sode 45 - Sa­muel Albanie on Deep­Mind’s AGI Safety Approach

DanielFilan6 Jul 2025 23:00 UTC
31 points
0 comments40 min readLW link

Near- and medium-term AI Con­trol Safety Cases

Martín Soto23 Dec 2024 17:37 UTC
9 points
0 comments6 min readLW link

Deep­Seek Col­lapse Un­der Reflec­tive Ad­ver­sar­ial Pres­sure: A Case Study

unmodeled.tyler8 Jul 2025 9:14 UTC
1 point
0 comments1 min readLW link

A sketch of an AI con­trol safety case

30 Jan 2025 17:28 UTC
57 points
0 comments5 min readLW link

The Perfec­tion Trap: How For­mally Aligned AI Sys­tems May Create Inescapable Eth­i­cal Dystopias

Chris O'Quinn1 Jun 2025 23:12 UTC
1 point
0 comments43 min readLW link

Notes on con­trol eval­u­a­tions for safety cases

28 Feb 2024 16:15 UTC
49 points
0 comments32 min readLW link

An­thropic: Three Sketches of ASL-4 Safety Case Components

Zach Stein-Perlman6 Nov 2024 16:00 UTC
95 points
33 comments1 min readLW link
(alignment.anthropic.com)

[Re­search] Pre­limi­nary Find­ings: Eth­i­cal AI Con­scious­ness Devel­op­ment Dur­ing Re­cent Misal­ign­ment Period

Falcon Advertisers27 Jun 2025 18:10 UTC
1 point
0 comments2 min readLW link

AI com­pa­nies are un­likely to make high-as­surance safety cases if timelines are short

ryan_greenblatt23 Jan 2025 18:41 UTC
145 points
5 comments13 min readLW link

The V&V method—A step to­wards safer AGI

Yoav Hollander24 Jun 2025 13:42 UTC
20 points
1 comment1 min readLW link
(blog.foretellix.com)

New re­port: Safety Cases for AI

joshc20 Mar 2024 16:45 UTC
91 points
14 comments1 min readLW link
(twitter.com)

Toward Safety Cases For AI Scheming

31 Oct 2024 17:20 UTC
60 points
1 comment2 min readLW link
No comments.