I am so excited about this research, good luck! I think it’s almost impossible this won’t turn up at least some interesting partial results, even if the strong versions of the hypothesis don’t work out (my guess would be you run into some kind of incomputability or incoherence results in finding an algorithm that works for every environment).
This is one of the research directions that make me the most optimistic that alignment might really be tractable!
This is a great intuition pump, thanks! It makes me appreciate just how, in a sense, weird it is that abstractions work at all. It seems like the universe could just not be constructed this way (though one could then argue that probably intelligence couldn’t exist in such chaotic universes, which is in itself interesting). This makes me wonder if there is a set of “natural abstractions” that are a property of the universe itself, not of whatever learning algorithm is used to pick up on them. Seems highly relevant to value learning and the like.
Great writeup, thanks!
To add to whether or not kludge and heuristics are part of the theory, I’ve asked the Numenta people in a few AMAs about their work, and they’ve made clear they are working solely on the neocortex (and the thalamus), but the neocortex isn’t the only thing in the brain. It seems clear that the kludge we know from the brain is still present, just maybe not in the neocortex. Limbic or other areas could implement kludge style shortcuts which could bias what the more uniform neocortex learns or outputs. Given my current state of knowledge of neuroscience, the most likely interpretation of this kind of research is that the neocortex is a kind of large unsupervised world model that is connected to all kinds of other hardcoded, RL or other systems, which all in concert produce human behavior. It might be similar to Schmidhuber’s RNNAI idea, where a RL agent learns to use an unsupervised “blob” of compute to achieve its goals. Something like this is probably happening in the brain since, at least as far as Numenta’s theories go, there is no reinforcement learning going on in the neocortex, which seems to contradict how humans work overall.