AI Robustness

TagLast edit: 24 Oct 2022 22:37 UTC by markov

AI Robustness is an agents ability to maintain its goal and its capabilities when exposed to different data distributions or environments.

Arguments for Robustness in AI Alignment

Fabian Schimpf19 Jan 2024 10:24 UTC

2 points

1 comment1 min readLW link

2023 Alignment Research Updates from FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 22:32 UTC

18 points

0 comments8 min readLW link

(far.ai)

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

40 points

0 comments5 min readLW link

(far.ai)

On Interpretability’s Robustness

WCargo18 Oct 2023 13:18 UTC

11 points

0 comments4 min readLW link

Desiderata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC

8 points

0 comments4 min readLW link

Random Observation on AI goals

FTPickle8 Apr 2023 19:28 UTC

−11 points

2 comments1 min readLW link

Squeezing foundations research assistance out of formal logic narrow AI.

Donald Hobson8 Mar 2023 9:38 UTC

16 points

1 comment2 min readLW link

AI Safety in a World of Vulnerable Machine Learning Systems

AdamGleave and EuanMcLean

8 Mar 2023 2:40 UTC

70 points

27 comments29 min readLW link

(far.ai)

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher King22 Feb 2023 16:49 UTC

1 point

7 comments1 min readLW link

[Question] Do the Safety Properties of Powerful AI Systems Need to be Adversarially Robust? Why?

DragonGod9 Feb 2023 13:36 UTC

22 points

42 comments2 min readLW link

Robustness & Evolution [MLAISU W02]

Esben Kran13 Jan 2023 15:47 UTC

10 points

0 comments3 min readLW link

(newsletter.apartresearch.com)

AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC

16 points

0 comments35 min readLW link

Robustness to Scaling Down: More Important Than I Thought

adamShimi23 Jul 2022 11:40 UTC

37 points

5 comments3 min readLW link

Robustness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC

128 points

23 comments2 min readLW link 1 review