RSS

AI Robustness

TagLast edit: 24 Oct 2022 22:37 UTC by markov

AI Robustness is an agents ability to maintain its goal and its capabilities when exposed to different data distributions or environments.

Ro­bust­ness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC
133 points
23 comments2 min readLW link1 review

AI Safety in a World of Vuln­er­a­ble Ma­chine Learn­ing Systems

8 Mar 2023 2:40 UTC
70 points
29 comments29 min readLW link
(far.ai)

Ro­bust­ness to Scal­ing Down: More Im­por­tant Than I Thought

adamShimi23 Jul 2022 11:40 UTC
38 points
5 comments3 min readLW link

Squeez­ing foun­da­tions re­search as­sis­tance out of for­mal logic nar­row AI.

Donald Hobson8 Mar 2023 9:38 UTC
16 points
1 comment2 min readLW link

[Question] Do the Safety Prop­er­ties of Pow­er­ful AI Sys­tems Need to be Ad­ver­sar­i­ally Ro­bust? Why?

DragonGod9 Feb 2023 13:36 UTC
22 points
42 comments2 min readLW link

Ran­dom Ob­ser­va­tion on AI goals

FTPickle8 Apr 2023 19:28 UTC
−11 points
2 comments1 min readLW link

AXRP Epi­sode 17 - Train­ing for Very High Reli­a­bil­ity with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC
16 points
0 comments35 min readLW link

Desider­ata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC
9 points
0 comments4 min readLW link

In­vest­ing in Ro­bust Safety Mechanisms is crit­i­cal for re­duc­ing Sys­temic Risks

11 Dec 2024 13:37 UTC
8 points
3 comments2 min readLW link

Work­shop Re­port: Why cur­rent bench­marks ap­proaches are not suffi­cient for safety?

26 Nov 2024 17:20 UTC
3 points
1 comment3 min readLW link

First Cer­tified Public Solve of Ob­server’s False Path In­sta­bil­ity — Level 4 (Ad­vanced Var­i­ant) — Walter Taran­telli — 2025-05-30 UTC

Walter Tarantelli31 May 2025 1:41 UTC
1 point
0 comments2 min readLW link

Does ro­bust­ness im­prove with scale?

25 Jul 2024 20:55 UTC
14 points
0 comments1 min readLW link
(far.ai)

Why Re­cur­sive Self-Im­prove­ment Might Not Be the Ex­is­ten­tial Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC
1 point
0 comments9 min readLW link

Is there a ML agent that aban­dons it’s util­ity func­tion out-of-dis­tri­bu­tion with­out los­ing ca­pa­bil­ities?

Christopher King22 Feb 2023 16:49 UTC
1 point
7 comments1 min readLW link

Ar­gu­ments for Ro­bust­ness in AI Alignment

Fabian Schimpf19 Jan 2024 10:24 UTC
2 points
1 comment1 min readLW link

Why Elimi­nat­ing De­cep­tion Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC
19 points
6 comments4 min readLW link

2023 Align­ment Re­search Up­dates from FAR AI

4 Dec 2023 22:32 UTC
18 points
0 comments8 min readLW link
(far.ai)

What’s new at FAR AI

4 Dec 2023 21:18 UTC
41 points
0 comments5 min readLW link
(far.ai)

On In­ter­pretabil­ity’s Robustness

WCargo18 Oct 2023 13:18 UTC
11 points
0 comments4 min readLW link

Beyond the Board: Ex­plor­ing AI Ro­bust­ness Through Go

AdamGleave19 Jun 2024 16:40 UTC
41 points
2 comments1 min readLW link
(far.ai)

Gra­di­ent Anatomy’s—Hal­lu­ci­na­tion Ro­bust­ness in Med­i­cal Q&A

DieSab12 Feb 2025 19:16 UTC
2 points
0 comments10 min readLW link

Ro­bust­ness & Evolu­tion [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC
10 points
0 comments3 min readLW link
(newsletter.apartresearch.com)
No comments.