I don’t share the feeling that not enough of relevance has happened over the last ten years for us to seem on track for solving it in a hundred years, if the world’s technology[1] were magically frozen in time.
Some more insights from the past ten years that look to me like they’re plausibly nascent steps in building up a science of intelligence and maybe later, alignment:
We understood some of the basics of general pattern matching: How it is possible for embedded minds that can’t be running actual Solomonoff induction to still have some ability to extrapolate from old data to new data. This used to be a big open problem in embedded agency, at least to me, and I think it is largely solved now. Admittedly a lot of the core work here actually happened more than ten years ago, but people in ml or our community didn’t know about it. [1,2]
Some basic observations and theories about the internal structure of the algorithms neural networks learn, and how they learn them. Yes, our networks may be a very small corner of mind space, but one example is way better than no examples! There’s a lot on this one, so the following is just a very small and biased selection. Note how some of these works are starting to properly build on each other. [1,2,3,4,5,6,7,8,9,10,11,12]
Some theory trying to link how AIs work to how human brains work. I feel less able to evaluate this one, but if the neurology basics are right it seems quite useful. [1]
QACI. What I’d consider the core useful QACI insight maybe sounds kind of obvious once you know about it. But I, at least, didn’t know about it. Like, if someone had told me: “A formal process we can describe that we’re pretty sure would return the goals we want an AGI to optimise for is itself often a sufficient specification of those goals.” I would’ve replied: “Well, duh.” But I wouldn’t have realised the implication. I needed to see an actual example for that. Plausibly MIRI people weren’t as dumb as me here and knew this pre-2015, I’m not sure.
The mesa-optimiser paper. This one probably didn’t have much insight that didn’t already exist pre-2015. But I think it communicated something central about the essence of the alignment problem to many people who hadn’t realised it before. [1]
If we were a normal scientific field with no deadline, I would feel very good about our progress here. Particularly given how small we are. CERN costs ca. €1.2 billion a year, I think all the funding for technical work and governance over the past 20 years taken together doesn’t add up to one year of that. Even if at the end of it all we still had to get ASI alignment right on the first try, I would still feel mostly good about this, if we had a hundred years.
I would also feel better about the field building situation if we had a hundred years. Yes, a lot of the things people tried for field building over the past ten years didn’t work as well as hoped. But we didn’t try that many things, a lot of the attempts struck me as inadequate in really basic ways that seem fixable in principle, and I would say the the end result still wasn’t no useful field building. I think the useful parts of the field have grown quite a lot even in the past three years! Just not as much as people like John or me thought they would, and not as much as we probably needed them to with the deadlines we seem likely to have.
Not to say that I wouldn’t still prefer to do some human intelligence enhancement first, even if we had a hundred years. That’s just the optimal move, even in a world where things look less grim.
But what really kills it for me is just the sheer lack of time.
I don’t share the feeling that not enough of relevance has happened over the last ten years for us to seem on track for solving it in a hundred years, if the world’s technology[1] were magically frozen in time.
Some more insights from the past ten years that look to me like they’re plausibly nascent steps in building up a science of intelligence and maybe later, alignment:
We understood some of the basics of general pattern matching: How it is possible for embedded minds that can’t be running actual Solomonoff induction to still have some ability to extrapolate from old data to new data. This used to be a big open problem in embedded agency, at least to me, and I think it is largely solved now. Admittedly a lot of the core work here actually happened more than ten years ago, but people in ml or our community didn’t know about it. [1,2]
Natural latents. [1,2,3]
Some basic observations and theories about the internal structure of the algorithms neural networks learn, and how they learn them. Yes, our networks may be a very small corner of mind space, but one example is way better than no examples! There’s a lot on this one, so the following is just a very small and biased selection. Note how some of these works are starting to properly build on each other. [1,2,3,4,5,6,7,8,9,10,11,12]
Some theory trying to link how AIs work to how human brains work. I feel less able to evaluate this one, but if the neurology basics are right it seems quite useful. [1]
QACI. What I’d consider the core useful QACI insight maybe sounds kind of obvious once you know about it. But I, at least, didn’t know about it. Like, if someone had told me: “A formal process we can describe that we’re pretty sure would return the goals we want an AGI to optimise for is itself often a sufficient specification of those goals.” I would’ve replied: “Well, duh.” But I wouldn’t have realised the implication. I needed to see an actual example for that. Plausibly MIRI people weren’t as dumb as me here and knew this pre-2015, I’m not sure.
The mesa-optimiser paper. This one probably didn’t have much insight that didn’t already exist pre-2015. But I think it communicated something central about the essence of the alignment problem to many people who hadn’t realised it before. [1]
If we were a normal scientific field with no deadline, I would feel very good about our progress here. Particularly given how small we are. CERN costs ca. €1.2 billion a year, I think all the funding for technical work and governance over the past 20 years taken together doesn’t add up to one year of that. Even if at the end of it all we still had to get ASI alignment right on the first try, I would still feel mostly good about this, if we had a hundred years.
I would also feel better about the field building situation if we had a hundred years. Yes, a lot of the things people tried for field building over the past ten years didn’t work as well as hoped. But we didn’t try that many things, a lot of the attempts struck me as inadequate in really basic ways that seem fixable in principle, and I would say the the end result still wasn’t no useful field building. I think the useful parts of the field have grown quite a lot even in the past three years! Just not as much as people like John or me thought they would, and not as much as we probably needed them to with the deadlines we seem likely to have.
Not to say that I wouldn’t still prefer to do some human intelligence enhancement first, even if we had a hundred years. That’s just the optimal move, even in a world where things look less grim.
But what really kills it for me is just the sheer lack of time.
Specifically AI and intelligence enhancement