RSS

Ro­bust Agents

TagLast edit: 14 Sep 2020 23:17 UTC by Ruby

Robust Agents are decision-makers who can perform well in a variety of situations. Whereas some humans rely on folk wisdom or instinct, and some AIs might be designed to achieve a narrow set of goals, a Robust Agent has a coherent set of values and decision-procedures. This enables them to adapt to new circumstances (such as succeeding in a new environment, or responding to a new strategy by a competitor).

See also

Be­ing a Ro­bust Agent

Raemon18 Oct 2018 7:00 UTC
150 points
32 comments7 min readLW link2 reviews

Se­cu­rity Mind­set and Or­di­nary Paranoia

Eliezer Yudkowsky25 Nov 2017 17:53 UTC
131 points
25 comments29 min readLW link

Desider­ata for an AI

Nathan Helm-Burger19 Jul 2023 16:18 UTC
9 points
0 comments4 min readLW link

Embed­ded Agency (full-text ver­sion)

15 Nov 2018 19:49 UTC
196 points
17 comments54 min readLW link

[Question] What if memes are com­mon in highly ca­pa­ble minds?

Daniel Kokotajlo30 Jul 2020 20:45 UTC
38 points
13 comments2 min readLW link

Gra­da­tions of Agency

Daniel Kokotajlo23 May 2022 1:10 UTC
41 points
6 comments5 min readLW link

Hu­mans are very re­li­able agents

alyssavance16 Jun 2022 22:02 UTC
266 points
35 comments3 min readLW link

Ro­bust Delegation

4 Nov 2018 16:38 UTC
116 points
10 comments1 min readLW link

On Be­ing Robust

TurnTrout10 Jan 2020 3:51 UTC
45 points
7 comments2 min readLW link

The Power of Agency

lukeprog7 May 2011 1:38 UTC
112 points
78 comments1 min readLW link

Subagents, akra­sia, and co­her­ence in humans

Kaj_Sotala25 Mar 2019 14:24 UTC
135 points
31 comments16 min readLW link

Up­com­ing sta­bil­ity of values

Stuart_Armstrong15 Mar 2018 11:36 UTC
15 points
15 comments2 min readLW link

Ro­bust Agency for Peo­ple and Organizations

Raemon19 Jul 2019 1:18 UTC
60 points
10 comments12 min readLW link

Even Su­per­hu­man Go AIs Have Sur­pris­ing Failure Modes

20 Jul 2023 17:31 UTC
129 points
22 comments10 min readLW link
(far.ai)

Thoughts on the 5-10 Problem

Tofly18 Jul 2019 18:56 UTC
18 points
11 comments1 min readLW link

Can we achieve AGI Align­ment by bal­anc­ing mul­ti­ple hu­man ob­jec­tives?

Ben Smith3 Jul 2022 2:51 UTC
11 points
1 comment4 min readLW link

Sets of ob­jec­tives for a multi-ob­jec­tive RL agent to optimize

23 Nov 2022 6:49 UTC
11 points
0 comments8 min readLW link

Re­ward is not Ne­c­es­sary: How to Create a Com­po­si­tional Self-Pre­serv­ing Agent for Life-Long Learning

Roman Leventov12 Jan 2023 16:43 UTC
17 points
2 comments2 min readLW link
(arxiv.org)

Tem­po­rally Lay­ered Ar­chi­tec­ture for Adap­tive, Distributed and Con­tin­u­ous Control

Roman Leventov2 Feb 2023 6:29 UTC
6 points
4 comments1 min readLW link
(arxiv.org)

A multi-dis­ci­plinary view on AI safety research

Roman Leventov8 Feb 2023 16:50 UTC
43 points
4 comments26 min readLW link

Ro­bust­ness to Scale

Scott Garrabrant21 Feb 2018 22:55 UTC
129 points
23 comments2 min readLW link1 review

On agen­tic gen­er­al­ist mod­els: we’re es­sen­tially us­ing ex­ist­ing tech­nol­ogy the weak­est and worst way you can use it

Yuli_Ban28 Aug 2024 1:57 UTC
10 points
2 comments9 min readLW link

[Aspira­tion-based de­signs] 2. For­mal frame­work, ba­sic algorithm

28 Apr 2024 13:02 UTC
16 points
2 comments16 min readLW link

Beyond the Board: Ex­plor­ing AI Ro­bust­ness Through Go

AdamGleave19 Jun 2024 16:40 UTC
41 points
2 comments1 min readLW link
(far.ai)

AISC pro­ject: Satis­fIA – AI that satis­fies with­out over­do­ing it

Jobst Heitzig11 Nov 2023 18:22 UTC
12 points
0 comments1 min readLW link
(docs.google.com)

Se­cu­rity Mind­set and the Lo­gis­tic Suc­cess Curve

Eliezer Yudkowsky26 Nov 2017 15:58 UTC
101 points
48 comments20 min readLW link

Reflec­tion in Prob­a­bil­is­tic Logic

Eliezer Yudkowsky24 Mar 2013 16:37 UTC
112 points
168 comments3 min readLW link

Tiling Agents for Self-Mod­ify­ing AI (OPFAI #2)

Eliezer Yudkowsky6 Jun 2013 20:24 UTC
88 points
259 comments3 min readLW link

2-D Robustness

Vlad Mikulik30 Aug 2019 20:27 UTC
85 points
8 comments2 min readLW link

Me­taphilo­soph­i­cal com­pe­tence can’t be dis­en­tan­gled from alignment

zhukeepa1 Apr 2018 0:38 UTC
46 points
39 comments3 min readLW link

An an­gle of at­tack on Open Prob­lem #1

Benya18 Aug 2012 12:08 UTC
48 points
85 comments7 min readLW link

Vingean Reflec­tion: Reli­able Rea­son­ing for Self-Im­prov­ing Agents

So8res15 Jan 2015 22:47 UTC
37 points
5 comments9 min readLW link