RSS

Im­pact Measures

TagLast edit: 21 Nov 2020 20:11 UTC by TurnTrout

Impact measures penalize an AI for affecting us too much. To reduce the risk posed by a powerful AI, you might want to make it try accomplish its goals with as little impact on the world as possible. You reward the AI for crossing a room; to maximize time-discounted total reward, the optimal policy makes a huge mess as it sprints to the other side.

How do you rigorously define “low impact” in a way that a computer can understand – how do you measure impact? These questions are important for both prosaic and future AI systems: objective specification is hard; we don’t want AI systems to rampantly disrupt their environment. In the limit of goal-directed intelligence, theorems suggest that seeking power tends to be optimal; we don’t want highly capable AI systems to permanently wrench control of the future from us.

Currently, impact measurement research focuses on two approaches:

For a review of earlier work, see A Survey of Early Impact Measures.

Sequences on impact measurement:

Related tags: Instrumental Convergence, Corrigibility, Mild Optimization.

Refram­ing Impact

TurnTrout20 Sep 2019 19:03 UTC
84 points
14 comments3 min readLW link2 nominations1 review

At­tain­able Utility Preser­va­tion: Concepts

TurnTrout17 Feb 2020 5:20 UTC
38 points
18 comments1 min readLW link

Trade­off be­tween de­sir­able prop­er­ties for baseline choices in im­pact measures

Vika4 Jul 2020 11:56 UTC
37 points
24 comments5 min readLW link

Towards a New Im­pact Measure

TurnTrout18 Sep 2018 17:21 UTC
97 points
159 comments33 min readLW link

Im­pact mea­sure­ment and value-neu­tral­ity verification

evhub15 Oct 2019 0:06 UTC
31 points
13 comments6 min readLW link

[Question] Best rea­sons for pes­simism about im­pact of im­pact mea­sures?

TurnTrout10 Apr 2019 17:22 UTC
60 points
55 comments3 min readLW link

De­sign­ing agent in­cen­tives to avoid side effects

11 Mar 2019 20:55 UTC
29 points
0 comments2 min readLW link
(medium.com)

Wor­ry­ing about the Vase: Whitelisting

TurnTrout16 Jun 2018 2:17 UTC
73 points
26 comments11 min readLW link

Value Impact

TurnTrout23 Sep 2019 0:47 UTC
60 points
8 comments1 min readLW link

De­duc­ing Impact

TurnTrout24 Sep 2019 21:14 UTC
64 points
25 comments1 min readLW link

At­tain­able Utility The­ory: Why Things Matter

TurnTrout27 Sep 2019 16:48 UTC
60 points
24 comments1 min readLW link

World State is the Wrong Ab­strac­tion for Impact

TurnTrout1 Oct 2019 21:03 UTC
61 points
19 comments2 min readLW link

The Gears of Impact

TurnTrout7 Oct 2019 14:44 UTC
50 points
14 comments1 min readLW link

At­tain­able Utility Land­scape: How The World Is Changed

TurnTrout10 Feb 2020 0:58 UTC
51 points
7 comments6 min readLW link

The Catas­trophic Con­ver­gence Conjecture

TurnTrout14 Feb 2020 21:16 UTC
39 points
15 comments8 min readLW link

At­tain­able Utility Preser­va­tion: Em­piri­cal Results

22 Feb 2020 0:38 UTC
48 points
7 comments9 min readLW link

How Low Should Fruit Hang Be­fore We Pick It?

TurnTrout25 Feb 2020 2:08 UTC
26 points
9 comments12 min readLW link

At­tain­able Utility Preser­va­tion: Scal­ing to Superhuman

TurnTrout27 Feb 2020 0:52 UTC
26 points
20 comments8 min readLW link

Rea­sons for Ex­cite­ment about Im­pact of Im­pact Mea­sure Research

TurnTrout27 Feb 2020 21:42 UTC
31 points
8 comments4 min readLW link

Con­clu­sion to ‘Refram­ing Im­pact’

TurnTrout28 Feb 2020 16:05 UTC
38 points
17 comments2 min readLW link

Learn­ing prefer­ences by look­ing at the world

rohinmshah12 Feb 2019 22:25 UTC
43 points
10 comments7 min readLW link
(bair.berkeley.edu)

Dy­namic in­con­sis­tency of the in­ac­tion and ini­tial state baseline

Stuart_Armstrong7 Jul 2020 12:02 UTC
30 points
8 comments2 min readLW link

Subagents and im­pact mea­sures, full and fully illustrated

Stuart_Armstrong24 Feb 2020 13:12 UTC
31 points
14 comments17 min readLW link

Over­com­ing Cling­i­ness in Im­pact Measures

TurnTrout30 Jun 2018 22:51 UTC
30 points
9 comments7 min readLW link

Ap­pendix: how a sub­agent could get powerful

Stuart_Armstrong28 Jan 2020 15:28 UTC
51 points
17 comments4 min readLW link

Ap­pendix: math­e­mat­ics of in­dex­i­cal im­pact measures

Stuart_Armstrong17 Feb 2020 13:22 UTC
12 points
0 comments4 min readLW link

Test Cases for Im­pact Reg­u­lari­sa­tion Methods

DanielFilan6 Feb 2019 21:50 UTC
58 points
5 comments12 min readLW link
(danielfilan.com)

Un­der­stand­ing Re­cent Im­pact Measures

Matthew Barnett7 Aug 2019 4:57 UTC
16 points
6 comments7 min readLW link

A Sur­vey of Early Im­pact Measures

Matthew Barnett6 Aug 2019 1:22 UTC
23 points
0 comments8 min readLW link

Four Ways An Im­pact Mea­sure Could Help Alignment

Matthew Barnett8 Aug 2019 0:10 UTC
21 points
1 comment8 min readLW link

Im­pact Mea­sure Desiderata

TurnTrout2 Sep 2018 22:21 UTC
36 points
41 comments5 min readLW link

Why is the im­pact penalty time-in­con­sis­tent?

Stuart_Armstrong9 Jul 2020 17:26 UTC
16 points
1 comment2 min readLW link

AI Align­ment 2018-19 Review

rohinmshah28 Jan 2020 2:19 UTC
115 points
6 comments35 min readLW link

Pe­nal­iz­ing Im­pact via At­tain­able Utility Preservation

TurnTrout28 Dec 2018 21:46 UTC
24 points
0 comments3 min readLW link
(arxiv.org)

Rev­ersible changes: con­sider a bucket of water

Stuart_Armstrong26 Aug 2019 22:55 UTC
25 points
18 comments2 min readLW link

Avoid­ing Side Effects in Com­plex Environments

12 Dec 2020 0:34 UTC
61 points
9 comments2 min readLW link
(avoiding-side-effects.github.io)

A Cri­tique of Non-Obstruction

Joe_Collman3 Feb 2021 8:45 UTC
13 points
10 comments4 min readLW link

[Question] “Do Noth­ing” util­ity func­tion, 3½ years later?

niplav20 Jul 2020 11:09 UTC
5 points
3 comments1 min readLW link

Yud­kowsky on AGI ethics

Rob Bensinger19 Oct 2017 23:13 UTC
49 points
6 comments2 min readLW link

An­nounce­ment: AI al­ign­ment prize round 4 winners

cousin_it20 Jan 2019 14:46 UTC
74 points
41 comments1 min readLW link

Asymp­tot­i­cally Unam­bi­tious AGI

michaelcohen6 Mar 2019 1:15 UTC
39 points
216 comments2 min readLW link

[AN #68]: The at­tain­able util­ity the­ory of impact

rohinmshah14 Oct 2019 17:00 UTC
17 points
0 comments8 min readLW link
(mailchi.mp)

Sim­plified prefer­ences needed; sim­plified prefer­ences sufficient

Stuart_Armstrong5 Mar 2019 19:39 UTC
29 points
6 comments3 min readLW link

A po­ten­tial prob­lem with re­duced impact

Chantiel14 Jan 2021 0:59 UTC
1 point
0 comments2 min readLW link
No comments.