adamShimi comments on The Pointers Problem: Human Values Are A Function Of Humans’ Latent Variables

adamShimi 26 Feb 2021 19:07 UTC
LW: 4 AF: 2
0
AF
In other words, how do we find the corresponding variables? I’ve given you an argument that the variables in an AGI’s world-model which correspond to the ones in your world-model can be found by expressing your concept in english sentences.
But you didn’t actually give an argument for that—you simply stated it. As a matter of fact, I disagree: it seems really easy for an AGI to misunderstand what I mean when I use english words. To go back to the “fusion power generator”, maybe it has a very deep model of such generators that abstracts away most of the concrete implementation details to capture the most efficient way of doing fusion; whereas my internal model of “fusion power generators” has a more concrete form and include safety guidelines.
In general, I don’t see why we should expect the abstraction most relevant for the AGI to be the one we’re using. Maybe it uses the same words for something quite different, like how successive paradigms in physics use the same word (electricity, gravity) to talk about different things (at least in their connotations and underlying explanations).
(That makes me think that it might be interesting to see how Kuhn’s arguments about such incomparability of paradigms hold in the context of this problem, as this seems similar).
- Ramana Kumar 8 Dec 2021 11:38 UTC
  LW: 6 AF: 3
  0
  AF Parent
  Here are two versions of “an AGI will understand very well what I mean”:
  1. Given things in my world model / ontology, the AGI will know which things they translate to in its own world model / ontology, such that the referents (the things “in the real world” being pointed at from our respective models) are essentially coextensive.
  2. For any behaviour I could exhibit (such as pressing a button, or expressing contentment with having reached common understanding in a dialogue) that, for me, turns on the words being used, the AGI is very good at predicting my behaviours conditional on the words I’m using, or causing me to exhibit behaviours by using words itself.
  Is version 1 something you get from more and more competence and generality on version 2? I think version 1 is more like the ideal version of “the AGI understands what I mean”, but is more confused (because I’m having to rely on concepts like “know” and “referent” and “translate”).
  I think Richard has stated that we can expect an AGI to understand what I mean, in version 2 sense, and either equivocates between the versions or presumes version 2 implies version 1. I think Adam is claiming that version 2 might not imply version 1, or pointing out that there’s still an argument missing there or problem to be solved there.