RSS

Guaran­teed Safe AI

TagLast edit: 9 Aug 2024 23:22 UTC by Ben Goldhaber

Novem­ber-De­cem­ber 2024 Progress in Guaran­teed Safe AI

Quinn22 Jan 2025 1:20 UTC
17 points
0 comments4 min readLW link
(gsai.substack.com)

AXRP Epi­sode 40 - Ja­son Gross on Com­pact Proofs and Interpretability

DanielFilan28 Mar 2025 18:40 UTC
26 points
0 comments89 min readLW link

Towards Guaran­teed Safe AI: A Frame­work for En­sur­ing Ro­bust and Reli­able AI Systems

Gunnar_Zarncke16 May 2024 13:09 UTC
51 points
20 comments1 min readLW link
(arxiv.org)

Towards Guaran­teed Safe AI: A Frame­work for En­sur­ing Ro­bust and Reli­able AI Systems

Joar Skalse17 May 2024 19:13 UTC
67 points
10 comments2 min readLW link

In re­sponse to cri­tiques of Guaran­teed Safe AI

Nora_Ammann31 Jan 2025 1:43 UTC
44 points
14 comments26 min readLW link

Agent foun­da­tions: not re­ally math, not re­ally science

Alex_Altair17 Aug 2025 5:48 UTC
114 points
25 comments5 min readLW link

Davi­dad’s Prov­ably Safe AI Ar­chi­tec­ture—ARIA’s Pro­gramme Thesis

simeon_c1 Feb 2024 21:30 UTC
69 points
17 comments1 min readLW link
(www.aria.org.uk)

Prov­ably Safe AI

PeterMcCluskey5 Oct 2023 22:18 UTC
35 points
15 comments4 min readLW link
(bayesianinvestor.com)

Can a Bayesian Or­a­cle Prevent Harm from an Agent? (Ben­gio et al. 2024)

mattmacdermott1 Sep 2024 7:46 UTC
28 points
0 comments5 min readLW link
(yoshuabengio.org)

Limi­ta­tions on For­mal Ver­ifi­ca­tion for AI Safety

Andrew Dickson19 Aug 2024 23:03 UTC
135 points
60 comments23 min readLW link

Topolog­i­cal De­bate Framework

lunatic_at_large16 Jan 2025 17:19 UTC
10 points
5 comments9 min readLW link

Prov­ably Safe AI: Wor­ld­view and Projects

9 Aug 2024 23:21 UTC
54 points
44 comments7 min readLW link
No comments.