Robust Agents

TagLast edit: 14 Sep 2020 23:17 UTC by Ruby

Robust Agents are decision-makers who can perform well in a variety of situations. Whereas some humans rely on folk wisdom or instinct, and some AIs might be designed to achieve a narrow set of goals, a Robust Agent has a coherent set of values and decision-procedures. This enables them to adapt to new circumstances (such as succeeding in a new environment, or responding to a new strategy by a competitor).

Being a Robust Agent

Raemon18 Oct 2018 7:00 UTC

156 points

32 comments7 min readLW link 2 reviews

Security Mindset and Ordinary Paranoia

Eliezer Yudkowsky25 Nov 2017 17:53 UTC

133 points

25 comments29 min readLW link

Subagents, akrasia, and coherence in humans

Kaj_Sotala25 Mar 2019 14:24 UTC

142 points

31 comments16 min readLW link

On Being Robust

TurnTrout10 Jan 2020 3:51 UTC

46 points

7 comments2 min readLW link

Embedded Agency (full-text version)

Scott Garrabrant and abramdemski

15 Nov 2018 19:49 UTC

210 points

17 comments54 min readLW link

Desiderata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC

9 points

0 comments4 min readLW link

Gradations of Agency

Daniel Kokotajlo23 May 2022 1:10 UTC

41 points

6 comments5 min readLW link

Robust Agency for People and Organizations

Raemon19 Jul 2019 1:18 UTC

65 points

10 comments12 min readLW link

Upcoming stability of values

Stuart_Armstrong15 Mar 2018 11:36 UTC

15 points

15 comments2 min readLW link

Humans are very reliable agents

alyssavance16 Jun 2022 22:02 UTC

270 points

35 comments3 min readLW link

The Power of Agency

lukeprog7 May 2011 1:38 UTC

119 points

78 comments1 min readLW link

[Question] What if memes are common in highly capable minds?

Daniel Kokotajlo30 Jul 2020 20:45 UTC

40 points

13 comments2 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

4 Nov 2018 16:38 UTC

116 points

10 comments1 min readLW link

A multi-disciplinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC

46 points

4 comments26 min readLW link

[Aspiration-based designs] 2. Formal framework, basic algorithm

Jobst Heitzig, Simon Dima and Simon Fischer

28 Apr 2024 13:02 UTC

18 points

2 comments16 min readLW link

We need a universal definition of ‘agency’ and related words

CstineSublime11 Jan 2025 3:22 UTC

18 points

1 comment5 min readLW link

An angle of attack on Open Problem #1

Benya18 Aug 2012 12:08 UTC

54 points

85 comments7 min readLW link

Metaphilosophical competence can’t be disentangled from alignment

zhukeepa1 Apr 2018 0:38 UTC

47 points

39 comments3 min readLW link

Thoughts on the 5-10 Problem

Tofly18 Jul 2019 18:56 UTC

18 points

11 comments1 min readLW link

Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC

17 points

2 comments2 min readLW link

(arxiv.org)

Robustness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC

133 points

23 comments2 min readLW link 1 review

Sets of objectives for a multi-objective RL agent to optimize

Ben Smith and Roland Pihlakas

23 Nov 2022 6:49 UTC

13 points

0 comments8 min readLW link

Reflection in Probabilistic Logic

Eliezer Yudkowsky24 Mar 2013 16:37 UTC

112 points

168 comments3 min readLW link

Security Mindset and the Logistic Success Curve

Eliezer Yudkowsky26 Nov 2017 15:58 UTC

106 points

49 comments20 min readLW link

Can we achieve AGI Alignment by balancing multiple human objectives?

Ben Smith3 Jul 2022 2:51 UTC

11 points

1 comment4 min readLW link

Temporally Layered Architecture for Adaptive, Distributed and Continuous Control

Roman Leventov2 Feb 2023 6:29 UTC

6 points

4 comments1 min readLW link

(arxiv.org)

Automated monitoring systems

hiki_t28 Nov 2024 18:54 UTC

1 point

0 comments2 min readLW link

Even Superhuman Go AIs Have Surprising Failure Modes

AdamGleave, EuanMcLean, Tony Wang, Kellin Pelrine, Tom Tseng, Yawen Duan, Joseph Miller and MichaelDennis

20 Jul 2023 17:31 UTC

130 points

22 comments10 min readLW link

(far.ai)

Tiling Agents for Self-Modifying AI (OPFAI #2)

Eliezer Yudkowsky6 Jun 2013 20:24 UTC

88 points

259 comments3 min readLW link

Vingean Reflection: Reliable Reasoning for Self-Improving Agents

So8res15 Jan 2015 22:47 UTC

37 points

5 comments9 min readLW link

2-D Robustness

Vlad Mikulik30 Aug 2019 20:27 UTC

85 points

8 comments2 min readLW link

AISC project: SatisfIA – AI that satisfies without overdoing it

Jobst Heitzig11 Nov 2023 18:22 UTC

12 points

0 comments1 min readLW link

(docs.google.com)

Beyond the Board: Exploring AI Robustness Through Go

AdamGleave19 Jun 2024 16:40 UTC

41 points

2 comments1 min readLW link

(far.ai)

On agentic generalist models: we’re essentially using existing technology the weakest and worst way you can use it

Yuli_Ban28 Aug 2024 1:57 UTC

10 points

2 comments9 min readLW link

Ruby 14 Sep 2020 22:47 UTC
2 points
0
I actually think this is definition is basically just what “agency” is.
- Ruby 14 Sep 2020 23:17 UTC
  2 points
  0
  Parent
  some AIs might be programmed with a narrow set of goals.
  I don’t think this precludes you from being an agent or should preclude you from being a “robust agent”.

Ro­bust Agents

See also

Robust Agents