AI Robustness

TagLast edit: 24 Oct 2022 22:37 UTC by markov

AI Robustness is an agents ability to maintain its goal and its capabilities when exposed to different data distributions or environments.

Robustness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC

133 points

23 comments2 min readLW link 1 review

AI Safety in a World of Vulnerable Machine Learning Systems

AdamGleave and EuanMcLean

8 Mar 2023 2:40 UTC

70 points

29 comments29 min readLW link

(far.ai)

Robustness to Scaling Down: More Important Than I Thought

adamShimi23 Jul 2022 11:40 UTC

38 points

5 comments3 min readLW link

Squeezing foundations research assistance out of formal logic narrow AI.

Donald Hobson8 Mar 2023 9:38 UTC

16 points

1 comment2 min readLW link

[Question] Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

DragonGod9 Feb 2023 13:36 UTC

22 points

42 comments2 min readLW link

Random Observation on AI goals

FTPickle8 Apr 2023 19:28 UTC

−11 points

2 comments1 min readLW link

AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC

16 points

0 comments35 min readLW link

Desiderata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC

9 points

0 comments4 min readLW link

Investing in Robust Safety Mechanisms is critical for reducing Systemic Risks

Tom DAVID, Pierre Peigné, Quentin FEUILLADE--MONTIXI, Kay Kozaronek and Miailhe Nicolas

11 Dec 2024 13:37 UTC

8 points

3 comments2 min readLW link

Workshop Report: Why current benchmarks approaches are not sufficient for safety?

Tom DAVID and Pierre Peigné

26 Nov 2024 17:20 UTC

3 points

1 comment3 min readLW link

First Certified Public Solve of Observer’s False Path Instability — Level 4 (Advanced Variant) — Walter Tarantelli — 2025-05-30 UTC

Walter Tarantelli31 May 2025 1:41 UTC

1 point

0 comments2 min readLW link

Does robustness improve with scale?

ChengCheng, niki.h, Ian McKenzie, Oskar Hollinsworth, Tom Tseng and AdamGleave

25 Jul 2024 20:55 UTC

14 points

0 comments1 min readLW link

(far.ai)

Why Recursive Self-Improvement Might Not Be the Existential Risk We Fear

Nassim_A24 Nov 2024 17:17 UTC

1 point

0 comments9 min readLW link

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher King22 Feb 2023 16:49 UTC

1 point

7 comments1 min readLW link

Arguments for Robustness in AI Alignment

Fabian Schimpf19 Jan 2024 10:24 UTC

2 points

1 comment1 min readLW link

Why Eliminating Deception Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC

19 points

6 comments4 min readLW link

2023 Alignment Research Updates from FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 22:32 UTC

18 points

0 comments8 min readLW link

(far.ai)

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

41 points

0 comments5 min readLW link

(far.ai)

On Interpretability’s Robustness

WCargo18 Oct 2023 13:18 UTC

11 points

0 comments4 min readLW link

Beyond the Board: Exploring AI Robustness Through Go

AdamGleave19 Jun 2024 16:40 UTC

41 points

2 comments1 min readLW link

(far.ai)

Gradient Anatomy’s—Hallucination Robustness in Medical Q&A

DieSab12 Feb 2025 19:16 UTC

2 points

0 comments10 min readLW link

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

No comments.