The Pointers Problem

TagLast edit: Oct 18, 2024, 9:29 AM by Rafael Harth

Consider an agent with a model of the world W. How does W relate to the real world? W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function f with W_chair ↦ R_chair.

The pointers problem is about figuring out f.

In John’s words (who introduced the concept here):

What functions of what variables (if any) in the environment and/or another world-model correspond to the latent variables in the agent’s world-model?

This relates to alignment, as we would like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. Therefore we’d like to figure out how to point to our values directly.

The Pointers Problem: Clarifications/Variations

abramdemskiJan 5, 2021, 5:29 PM

61 points

16 comments18 min readLW link

The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables

johnswentworthNov 18, 2020, 5:47 PM

129 points

50 comments11 min readLW link 2 reviews

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

Nov 18, 2022, 1:48 AM

72 points

64 comments12 min readLW link

[Intro to brain-like-AGI safety] 9. Takeaways from neuro 2/2: On AGI motivation

Steven ByrnesMar 23, 2022, 12:48 PM

46 points

11 comments22 min readLW link

Stable Pointers to Value: An Agent Embedded in Its Own Utility Function

abramdemskiAug 17, 2017, 12:22 AM

15 points

9 comments5 min readLW link

Robust Delegation

abramdemski and Scott Garrabrant

Nov 4, 2018, 4:38 PM

116 points

10 comments1 min readLW link

The Pointer Resolution Problem

JozdienFeb 16, 2024, 9:25 PM

41 points

20 comments3 min readLW link

Alignment allows “nonrobust” decision-influences and doesn’t require robust grading

TurnTroutNov 29, 2022, 6:23 AM

62 points

41 comments15 min readLW link

People care about each other even though they have imperfect motivational pointers?

TurnTroutNov 8, 2022, 6:15 PM

33 points

25 comments7 min readLW link

Don’t align agents to evaluations of plans

TurnTroutNov 26, 2022, 9:16 PM

48 points

49 comments18 min readLW link

Stable Pointers to Value III: Recursive Quantilization

abramdemskiJul 21, 2018, 8:06 AM

20 points

4 comments4 min readLW link

Stable Pointers to Value II: Environmental Goals

abramdemskiFeb 9, 2018, 6:03 AM

19 points

3 comments4 min readLW link

Half-baked idea: a straightforward method for learning environmental goals?

Q HomeFeb 4, 2025, 6:56 AM

16 points

7 comments5 min readLW link

Clarifying Alignment Fundamentals Through the Lens of Ontology

Ben IhrigOct 7, 2024, 8:57 PM

12 points

4 comments24 min readLW link

[Question] Popular materials about environmental goals/agent foundations? People wanting to discuss such topics?

Q HomeJan 22, 2025, 3:30 AM

5 points

0 comments1 min readLW link

Human sexuality as an interesting case study of alignment

berenDec 30, 2022, 1:37 PM

39 points

26 comments3 min readLW link

Updating Utility Functions

JustinShovelain and Joar Skalse

May 9, 2022, 9:44 AM

41 points

6 comments8 min readLW link

No comments.

The Poin­t­ers Problem

The Pointers Problem