RSS

Elic­it­ing La­tent Knowl­edge (ELK)

TagLast edit: 31 Mar 2022 3:06 UTC by Multicore

Eliciting Latent Knowledge is an open problem in AI safety.

Suppose we train a model to predict what the future will look like according to cameras and other sensors. We then use planning algorithms to find a sequence of actions that lead to predicted futures that look good to us.

But some action sequences could tamper with the cameras so they show happy humans regardless of what’s really happening. More generally, some futures look great on camera but are actually catastrophically bad.

In these cases, the prediction model “knows” facts (like “the camera was tampered with”) that are not visible on camera but would change our evaluation of the predicted future if we learned them. How can we train this model to report its latent knowledge of off-screen events?

--ARC report

See also: Transparency/​Interpretability

ARC’s first tech­ni­cal re­port: Elic­it­ing La­tent Knowledge

14 Dec 2021 20:09 UTC
185 points
88 comments1 min readLW link
(docs.google.com)

ELK prize results

9 Mar 2022 0:01 UTC
127 points
49 comments21 min readLW link

Coun­terex­am­ples to some ELK proposals

paulfchristiano31 Dec 2021 17:05 UTC
49 points
10 comments7 min readLW link

Elic­it­ing La­tent Knowl­edge Via Hy­po­thet­i­cal Sensors

John_Maxwell30 Dec 2021 15:53 UTC
37 points
2 comments6 min readLW link

Prizes for ELK proposals

paulfchristiano3 Jan 2022 20:23 UTC
139 points
156 comments7 min readLW link

Im­por­tance of fore­sight eval­u­a­tions within ELK

Jonathan Uesato6 Jan 2022 15:34 UTC
25 points
1 comment10 min readLW link

ELK First Round Con­test Winners

26 Jan 2022 2:56 UTC
63 points
6 comments1 min readLW link

ELK Pro­posal: Think­ing Via A Hu­man Imitator

TurnTrout22 Feb 2022 1:52 UTC
28 points
6 comments11 min readLW link

Im­pli­ca­tions of au­to­mated on­tol­ogy identification

18 Feb 2022 3:30 UTC
65 points
29 comments23 min readLW link

Un­der­stand­ing the two-head strat­egy for teach­ing ML to an­swer ques­tions honestly

Adam Scherlis11 Jan 2022 23:24 UTC
26 points
1 comment10 min readLW link

Is ELK enough? Di­a­mond, Ma­trix and Child AI

adamShimi15 Feb 2022 2:29 UTC
17 points
10 comments4 min readLW link

What Does The Nat­u­ral Ab­strac­tion Frame­work Say About ELK?

johnswentworth15 Feb 2022 2:27 UTC
32 points
0 comments6 min readLW link

Some Hacky ELK Ideas

johnswentworth15 Feb 2022 2:27 UTC
33 points
8 comments5 min readLW link

REPL’s: a type sig­na­ture for agents

scottviteri15 Feb 2022 22:57 UTC
22 points
5 comments2 min readLW link

Two Challenges for ELK

derek shiller21 Feb 2022 5:49 UTC
7 points
0 comments4 min readLW link

ELK Thought Dump

abramdemski28 Feb 2022 18:46 UTC
58 points
18 comments17 min readLW link

Mus­ings on the Speed Prior

evhub2 Mar 2022 4:04 UTC
19 points
4 comments10 min readLW link

ELK Sub—Note-tak­ing in in­ter­nal rollouts

Hoagy9 Mar 2022 17:23 UTC
5 points
0 comments5 min readLW link

ELK con­test sub­mis­sion: route un­der­stand­ing through the hu­man ontology

14 Mar 2022 21:42 UTC
21 points
2 comments2 min readLW link

[Question] Can you be Not Even Wrong in AI Align­ment?

throwaway823819 Mar 2022 17:41 UTC
22 points
7 comments8 min readLW link

[ASoT] Ob­ser­va­tions about ELK

leogao26 Mar 2022 0:42 UTC
30 points
0 comments3 min readLW link

Towards a bet­ter cir­cuit prior: Im­prov­ing on ELK state-of-the-art

evhub29 Mar 2022 1:56 UTC
11 points
0 comments16 min readLW link

ELK Com­pu­ta­tional Com­plex­ity: Three Levels of Difficulty

abramdemski30 Mar 2022 20:56 UTC
46 points
9 comments7 min readLW link

If you’re very op­ti­mistic about ELK then you should be op­ti­mistic about outer alignment

Sam Marks27 Apr 2022 19:30 UTC
17 points
8 comments3 min readLW link

Note-Tak­ing with­out Hid­den Messages

Hoagy30 Apr 2022 11:15 UTC
6 points
1 comment4 min readLW link

Clar­ify­ing what ELK is try­ing to achieve

Simon Skade21 May 2022 7:34 UTC
4 points
0 comments5 min readLW link

The Greedy Doc­tor Prob­lem… turns out to be rele­vant to the ELK prob­lem?

Jan14 Jan 2022 11:58 UTC
31 points
6 comments14 min readLW link
(universalprior.substack.com)

REPL’s and ELK

scottviteri17 Feb 2022 1:14 UTC
9 points
4 comments1 min readLW link

[ASoT] Some ways ELK could still be solv­able in practice

leogao27 Mar 2022 1:15 UTC
26 points
1 comment2 min readLW link

Vaniver’s ELK Submission

Vaniver28 Mar 2022 21:14 UTC
10 points
0 comments7 min readLW link

Is GPT3 a Good Ra­tion­al­ist? - In­struc­tGPT3 [2/​2]

WayZ7 Apr 2022 13:46 UTC
10 points
0 comments7 min readLW link

ELK shaving

Miss Aligned AI1 May 2022 21:05 UTC
6 points
1 comment1 min readLW link

In­ter­pretabil­ity’s Align­ment-Solv­ing Po­ten­tial: Anal­y­sis of 7 Scenarios

Evan R. Murphy12 May 2022 20:01 UTC
41 points
0 comments59 min readLW link

Croe­sus, Cer­berus, and the mag­pies: a gen­tle in­tro­duc­tion to Elic­it­ing La­tent Knowledge

Alexandre Variengien27 May 2022 17:58 UTC
11 points
0 comments16 min readLW link

Elic­it­ing La­tent Knowl­edge (ELK) - Distil­la­tion/​Summary

Marius Hobbhahn8 Jun 2022 13:18 UTC
41 points
2 comments21 min readLW link

ELK Pro­posal—Make the Re­porter care about the Pre­dic­tor’s beliefs

11 Jun 2022 22:53 UTC
8 points
0 comments6 min readLW link
No comments.