# The Poin­t­ers Problem

TagLast edit: 22 Dec 2023 19:16 UTC by

Consider an agent with a model of the world W. How does W relate to the real world. W might contain a chair. In order for W to be useful it needs to map to reality, i.e. there is a function `f` with `W_chair ↦ R_chair`.

The pointers problem is about figuring out `f`.

In John’s words (who introduced the concept here):

What functions of what variables (if any) in the environment and/​or another world-model correspond to the latent variables in the agent’s world-model?

This relates to alignment, as we would like an AI that acts based on real-world human values, not just human estimates of their own values – and that the two will be different in many situations, since humans are not all-seeing or all-knowing. Therefore we’d like to figure out how to point to our values directly.

# The Poin­t­ers Prob­lem: Hu­man Values Are A Func­tion Of Hu­mans’ La­tent Variables

18 Nov 2020 17:47 UTC
125 points

# Don’t de­sign agents which ex­ploit ad­ver­sar­ial inputs

18 Nov 2022 1:48 UTC
70 points

# Stable Poin­t­ers to Value III: Re­cur­sive Quantilization

21 Jul 2018 8:06 UTC
20 points

# The Poin­ter Re­s­olu­tion Problem

16 Feb 2024 21:25 UTC
41 points

# Stable Poin­t­ers to Value: An Agent Embed­ded in Its Own Utility Function

17 Aug 2017 0:22 UTC
15 points

# Ro­bust Delegation

4 Nov 2018 16:38 UTC
116 points

# Don’t al­ign agents to eval­u­a­tions of plans

26 Nov 2022 21:16 UTC
42 points

# Stable Poin­t­ers to Value II: En­vi­ron­men­tal Goals

9 Feb 2018 6:03 UTC
19 points

# Peo­ple care about each other even though they have im­perfect mo­ti­va­tional poin­t­ers?

8 Nov 2022 18:15 UTC
33 points

# [In­tro to brain-like-AGI safety] 9. Take­aways from neuro 2/​2: On AGI motivation

23 Mar 2022 12:48 UTC
44 points

# Align­ment al­lows “non­ro­bust” de­ci­sion-in­fluences and doesn’t re­quire ro­bust grading

29 Nov 2022 6:23 UTC
60 points

# The Poin­t­ers Prob­lem: Clar­ifi­ca­tions/​Variations

5 Jan 2021 17:29 UTC
61 points

# Up­dat­ing Utility Functions

9 May 2022 9:44 UTC
41 points