Re: worries about “reward”, I don’t feel like I have a great understanding of what your worry is, but I’d try to summarize it as “while the abstraction of reward is technically sufficiently expressive, 1) it may not have the right inductive biases, and so the framework might fail in practice, and 2) it is not a good framework for thought, because it doesn’t sufficiently emphasize many important concepts like logic and hierarchical planning”.
I think I broadly agree with those points if our plan is to explicitly learn human values, but it seems less relevant when we aren’t trying to do that and are instead trying to
provide a general method for creating AI systems that pursue some specific task, interpreted the way we meant it to be interpreted.
In this framework, “knowledge about what humans want” doesn’t come from a reward function, it comes from something like GPT-3 pretraining. The AI system can “invent” whatever concepts are best for representing its knowledge, which includes what humans want.
Here, reward functions should instead be thought of as akin to loss functions—they are ways of incentivizing particular kinds of outputs. I think it’s reasonable to think on priors that this wouldn’t be sufficient to get logical / hierarchical behavior, but I think GPT and AlphaStar and all the other recent successes should make you rethink that judgment.
----
The trend-following behavior in most scientific & engineering fields, including AI, should make us skeptical that currently popular approaches are popular for the right reasons.
I agree that trend-following behavior exists. I agree that this means that work on deep learning is less promising than you might otherwise think. That doesn’t mean it’s the wrong decision; if there are a hundred other plausible directions, it can still be the case that it’s better to bet on deep learning rather than try your hand at guessing which paradigm will become dominant next. To quote Rodney Brooks:
Whatever [the “next big thing”] turns out to be, it will be something that someone is already working on, and there are already published papers about it. There will be many claims on this title earlier than 2023, but none of them will pan out.
He also predicts that the “next big thing” will happen by 2027 (though I get the sense that he might count new kinds of deep learning architectures as a “big thing” so he may not be predicting something as paradigm-shifting as you’re thinking).
Whether to diversify depends on the size of the field; if you have 1 million alignment researchers you definitely want to diversify, whereas at 5 researchers you almost certainly don’t; I’m claiming that we’re small enough now and uninformed enough about alternatives to deep learning that diversification is not a great approach.
We have extra reason to be cautious about deep learning being popular for the wrong reasons, given that many AI researchers say that we should be focusing less on machine learning while at the same time publishing heavily in machine learning.
Just because AI research should diversify doesn’t mean alignment research should diversify—given their relative sizes, it seems correct for alignment researchers to focus on the dominant paradigm while letting AI researchers explore the space of possible ways to build AI. Alignment researchers should then be ready to switch paradigms if a new one is found.
A lot of of prominent researchers like Stuart Russell, Gary Marcus, and Josh Tenenbaum all think that we need to re-invigorate symbolic and Bayesian approaches (perhaps through hybrid neuro-symbolic methods)
This feels like the most compelling argument, since it identifies particular other approaches (though still very large ones). Some objections from the outside view:
I think all three of the researchers you mentioned have long timelines; work is generally more useful on shorter timelines, this should bias you towards what is currently popular. Some of these researchers don’t think we can get to AGI at all; as long as you aren’t confident that they are correct, you should ignore that position (if we’re in that world, then there isn’t any AI alignment x-risk, so it isn’t decision-relevant).
I find the arguments given by these researchers to be relatively weak and easily countered, and am more inclined to use inside-view arguments as a result. (Though I should note that I think that it is often correct to trust in an expert even when their arguments seem weak, so this is a relatively minor point.)
(Re: Hinton and Bengio, I feel like that’s in support of the work that’s currently being done; the work that comes out of those labs doesn’t seem that different from what comes out of OpenAI and DeepMind.)
Going to the inside view on neurosymbolic AI:
(even AlphaGo can be seen as a version of this, if one views MCTS as symbolic)
Overall, I do expect that neurosymbolic approaches will be helpful and used in many practical AI applications; they allow you to encode relevant domain knowledge without having to learn it all from scratch. I don’t currently see that it introduces new alignment problems, or changes how we should think about the existing problems that we work on, and that’s the main reason I don’t focus on it. But I certainly agree with that as a background model of what future AI systems will look like, and if someone identified a problem that happens with neurosymbolic AI that isn’t addressed by current work in AI alignment, I’d be pretty excited to see research solving that problem, and might do it myself.
----
Things I do agree with:
It would be significantly better if the average / median commenter on the Alignment Forum knew more about AI techniques. (I think this is also true of deep learning.)
There will probably be something in the future that radically changes our beliefs about AGI.
Re: worries about “reward”, I don’t feel like I have a great understanding of what your worry is, but I’d try to summarize it as “while the abstraction of reward is technically sufficiently expressive, 1) it may not have the right inductive biases, and so the framework might fail in practice, and 2) it is not a good framework for thought, because it doesn’t sufficiently emphasize many important concepts like logic and hierarchical planning”.
I think I broadly agree with those points if our plan is to explicitly learn human values, but it seems less relevant when we aren’t trying to do that and are instead trying to
In this framework, “knowledge about what humans want” doesn’t come from a reward function, it comes from something like GPT-3 pretraining. The AI system can “invent” whatever concepts are best for representing its knowledge, which includes what humans want.
Here, reward functions should instead be thought of as akin to loss functions—they are ways of incentivizing particular kinds of outputs. I think it’s reasonable to think on priors that this wouldn’t be sufficient to get logical / hierarchical behavior, but I think GPT and AlphaStar and all the other recent successes should make you rethink that judgment.
----
I agree that trend-following behavior exists. I agree that this means that work on deep learning is less promising than you might otherwise think. That doesn’t mean it’s the wrong decision; if there are a hundred other plausible directions, it can still be the case that it’s better to bet on deep learning rather than try your hand at guessing which paradigm will become dominant next. To quote Rodney Brooks:
He also predicts that the “next big thing” will happen by 2027 (though I get the sense that he might count new kinds of deep learning architectures as a “big thing” so he may not be predicting something as paradigm-shifting as you’re thinking).
Whether to diversify depends on the size of the field; if you have 1 million alignment researchers you definitely want to diversify, whereas at 5 researchers you almost certainly don’t; I’m claiming that we’re small enough now and uninformed enough about alternatives to deep learning that diversification is not a great approach.
Just because AI research should diversify doesn’t mean alignment research should diversify—given their relative sizes, it seems correct for alignment researchers to focus on the dominant paradigm while letting AI researchers explore the space of possible ways to build AI. Alignment researchers should then be ready to switch paradigms if a new one is found.
This feels like the most compelling argument, since it identifies particular other approaches (though still very large ones). Some objections from the outside view:
I think all three of the researchers you mentioned have long timelines; work is generally more useful on shorter timelines, this should bias you towards what is currently popular. Some of these researchers don’t think we can get to AGI at all; as long as you aren’t confident that they are correct, you should ignore that position (if we’re in that world, then there isn’t any AI alignment x-risk, so it isn’t decision-relevant).
I find the arguments given by these researchers to be relatively weak and easily countered, and am more inclined to use inside-view arguments as a result. (Though I should note that I think that it is often correct to trust in an expert even when their arguments seem weak, so this is a relatively minor point.)
(Re: Hinton and Bengio, I feel like that’s in support of the work that’s currently being done; the work that comes out of those labs doesn’t seem that different from what comes out of OpenAI and DeepMind.)
Going to the inside view on neurosymbolic AI:
I feel like if you endorse this then you should also think of iterated amplification as neurosymbolic (though maybe you think if humans are involved that’s “neurohuman” rather than neurosymbolic and the distinction is relevant for some reason).
Overall, I do expect that neurosymbolic approaches will be helpful and used in many practical AI applications; they allow you to encode relevant domain knowledge without having to learn it all from scratch. I don’t currently see that it introduces new alignment problems, or changes how we should think about the existing problems that we work on, and that’s the main reason I don’t focus on it. But I certainly agree with that as a background model of what future AI systems will look like, and if someone identified a problem that happens with neurosymbolic AI that isn’t addressed by current work in AI alignment, I’d be pretty excited to see research solving that problem, and might do it myself.
----
Things I do agree with:
It would be significantly better if the average / median commenter on the Alignment Forum knew more about AI techniques. (I think this is also true of deep learning.)
There will probably be something in the future that radically changes our beliefs about AGI.