I mean, no mind really exists at the time the “misunderstanding” is starting to happen, no? Unless you want to call a randomly initialized NN (i. e., basically a random program) a “mind”… Which wouldn’t necessarily be an invalid frame to use. But I don’t think it’s the obviously correct frame either, and so I don’t think that people who use a mechanistic frame by default are unambiguously in error.
Therefore Bostrom expects we will not have an AI that correctly understands concepts like intelligence until after it is already superintelligent.
That is straightforwardly correct. But “there exists no AI that understands” is importantly different from “there exists an AI which misunderstands”.
Another questionable frame here is characterizing the relationship between an AI and the SGD/the training process shaping it as some sort of communication process (?), such that the AI ending up misshapen can be described as it “misunderstanding” something.
And the training process itself never becomes a mind, it starts and ends as a discrete program, so if you mean to say that it “misunderstood” something, I think that’s a type error/at best a metaphor.
(I guess it may still be valid from a point of view where you frame SGD updates as Bayesian updates, or something along those lines? But that’s also a non-standard frame.)
in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.
separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.
I mean, no mind really exists at the time the “misunderstanding” is starting to happen, no? Unless you want to call a randomly initialized NN (i. e., basically a random program) a “mind”… Which wouldn’t necessarily be an invalid frame to use. But I don’t think it’s the obviously correct frame either, and so I don’t think that people who use a mechanistic frame by default are unambiguously in error.
I note that in your step-by-step explanation, the last bullet is:
That is straightforwardly correct. But “there exists no AI that understands” is importantly different from “there exists an AI which misunderstands”.
Another questionable frame here is characterizing the relationship between an AI and the SGD/the training process shaping it as some sort of communication process (?), such that the AI ending up misshapen can be described as it “misunderstanding” something.
And the training process itself never becomes a mind, it starts and ends as a discrete program, so if you mean to say that it “misunderstood” something, I think that’s a type error/at best a metaphor.
(I guess it may still be valid from a point of view where you frame SGD updates as Bayesian updates, or something along those lines? But that’s also a non-standard frame.)
in practice, we seem to train the world model and understanding machine first and the policy only much later as a thin patch on top of the world model. this is not guaranteed to stay true but seems pretty durable so far. thus, the relevant heuristics are about base models not about randomly initialized neural networks.
separately, I do think randomly initialized neural networks have some strong baseline of fuzziness and conceptual corrigibility, which is in a sense what it means to have a traversible loss landscape.