After seeing a number of rather gloomy posts on the site in the last few days, I feel a need to point out that problems that we don’t currently know how to solve always look impossible. A smart guy once pointed out how silly it was the Lord Kelvin claimed “The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on.” Kelvin just didn’t know how to do it. That’s fine. Deciding it’s a Hard Problem just sort of throws up mental blocks to finding potential obvious solutions.
Maybe alignment will seem really easy in retrospect. Maybe it’s the sort of thing that requires only two small insights that we don’t currently have. Maybe we already have all the insights we need and somebody just needs to connect them together in a non-obvious way. Maybe somebody has already had the key idea, and just thought to themselves, no, it can’t be that simple! (I actually sort of viscerally suspect that the lynchpin of alignment will turn out to be something really dumb and easy that we’ve simply overlooked, and not something like Special Relativity.) Everything seems hard in advance, and we’ve spent far more effort as a civilization studying asphalt than we have alignment. We’ve tried almost nothing so far.
In the same way that we have an existence-proof of AGI (humans existing) we also have a highly suggestive example of something that looks a lot like alignment (humans existing and often choosing not to do heroin), except probably not robust to infinite capability increase, blah blah.
The “probabilistic mainline path” always looks really grim when success depends on innovations and inventions you don’t currently know how to do. Nobody knows what probability to put on obtaining such innovations in advance. If you asked me ten years ago I would have put the odds of SpaceX Starship existing at like 2%, probably even after thinking really hard about it.
In the same way that we have an existence-proof of AGI (humans existing) we also have a highly suggestive example of something that looks a lot like alignment (humans existing and often choosing not to do heroin)
That’s not an example of alignment, that’s an example of sub-agent stability, which is assumed to be true due to instrumental convergence in any sufficiently powerful AI system, aligned or unaligned.
If anything, humanity is an excellent example of alignment failure considering we have discovered the true utility function of our creator and decided to ignore it anyway and side with proxy values such as love/empathy/curiosity etc.
Our creator doesn’t have a utility function in any meaningful sense of the term. Genes that adapt best for survival and reproduction propagate through the population, but it’s competitive. Evolution doesn’t have goals, and in fact from the standpoint of individual genes (where evolution works) it is entirely a zero-sum game.
If anything, humanity is an excellent example of alignment failure considering we have discovered the true utility function of our creator and decided to ignore it anyway and side with proxy values such as love/empathy/curiosity etc.
Or we are waiting to be outbred by those who didn’t. A few centuries ago, the vast majority of people were herders or farmers who had as many kids as they could feed. Their actions were aligned with maximization of their inclusive genetic fitness. We are the exception, not the rule.
When I look at the world today, it really doesn’t seem like a ship steered by evolution. (Instead it is a ship steered by no one, chaotically drifting.) Maybe if there is economic and technological stagnation for ten thousand years, then maybe evolution will get back in the drivers seat and continue the long slow process of aligning humans… but I think that’s very much not the most probable outcome.
Indeed, these are items on a ‘high-level reasons not to be maximally pessimistic about AGI’ list I made for some friends three years ago. Maybe I’ll post that on LW in the next week or two.
I share Eliezer’s pessimism, but I worry that some people only have negative factors bouncing around in their minds, and not positive factors, and that this is making them overshoot Eliezer’s ‘seems very dire’ and go straight to ‘seems totally hopeless’. (Either with regard to alignment research, or with regard to the whole problem. Maybe also related to the tendency IME for people to either assume a problem is easy or impossible, without much room in between.)
I agree. This wasn’t meant as an object level discussion of whether the “alignment is doomed” claim is true. What I’d hopes to convey is that, even if the research is on the wrong track, we can still massively increase the chances of a good outcome, using some of the options I described
That said, I don’t think Starship is a good analogy. We already knew that such a rocket can work in theory, so it was a matter of engineering, experimentation, and making a big organization work. What if a closer analogy to seeing alignment solved was seeing a proof of P=NP this year?
It doesn’t seem credible for AIs to be more aligned with researchers than researchers are aligned with each other, or with the general population.
Maybe that’s ‘gloomy’ but thats no different than how human affairs have progressed since the first tribes were established. From the viewpoint of broader society it’s more of positive development to understand there’s an upper limit for how much alignment efforts can expect to yield. So that resources are allocated properly to their most beneficial usage.
After seeing a number of rather gloomy posts on the site in the last few days, I feel a need to point out that problems that we don’t currently know how to solve always look impossible. A smart guy once pointed out how silly it was the Lord Kelvin claimed “The influence of animal or vegetable life on matter is infinitely beyond the range of any scientific inquiry hitherto entered on.” Kelvin just didn’t know how to do it. That’s fine. Deciding it’s a Hard Problem just sort of throws up mental blocks to finding potential obvious solutions.
Maybe alignment will seem really easy in retrospect. Maybe it’s the sort of thing that requires only two small insights that we don’t currently have. Maybe we already have all the insights we need and somebody just needs to connect them together in a non-obvious way. Maybe somebody has already had the key idea, and just thought to themselves, no, it can’t be that simple! (I actually sort of viscerally suspect that the lynchpin of alignment will turn out to be something really dumb and easy that we’ve simply overlooked, and not something like Special Relativity.) Everything seems hard in advance, and we’ve spent far more effort as a civilization studying asphalt than we have alignment. We’ve tried almost nothing so far.
In the same way that we have an existence-proof of AGI (humans existing) we also have a highly suggestive example of something that looks a lot like alignment (humans existing and often choosing not to do heroin), except probably not robust to infinite capability increase, blah blah.
The “probabilistic mainline path” always looks really grim when success depends on innovations and inventions you don’t currently know how to do. Nobody knows what probability to put on obtaining such innovations in advance. If you asked me ten years ago I would have put the odds of SpaceX Starship existing at like 2%, probably even after thinking really hard about it.
That’s not an example of alignment, that’s an example of sub-agent stability, which is assumed to be true due to instrumental convergence in any sufficiently powerful AI system, aligned or unaligned.
If anything, humanity is an excellent example of alignment failure considering we have discovered the true utility function of our creator and decided to ignore it anyway and side with proxy values such as love/empathy/curiosity etc.
Our creator doesn’t have a utility function in any meaningful sense of the term. Genes that adapt best for survival and reproduction propagate through the population, but it’s competitive. Evolution doesn’t have goals, and in fact from the standpoint of individual genes (where evolution works) it is entirely a zero-sum game.
Or we are waiting to be outbred by those who didn’t. A few centuries ago, the vast majority of people were herders or farmers who had as many kids as they could feed. Their actions were aligned with maximization of their inclusive genetic fitness. We are the exception, not the rule.
When I look at the world today, it really doesn’t seem like a ship steered by evolution. (Instead it is a ship steered by no one, chaotically drifting.) Maybe if there is economic and technological stagnation for ten thousand years, then maybe evolution will get back in the drivers seat and continue the long slow process of aligning humans… but I think that’s very much not the most probable outcome.
I agree with the first two paragraphs here. :)
Indeed, these are items on a ‘high-level reasons not to be maximally pessimistic about AGI’ list I made for some friends three years ago. Maybe I’ll post that on LW in the next week or two.
I share Eliezer’s pessimism, but I worry that some people only have negative factors bouncing around in their minds, and not positive factors, and that this is making them overshoot Eliezer’s ‘seems very dire’ and go straight to ‘seems totally hopeless’. (Either with regard to alignment research, or with regard to the whole problem. Maybe also related to the tendency IME for people to either assume a problem is easy or impossible, without much room in between.)
I agree. This wasn’t meant as an object level discussion of whether the “alignment is doomed” claim is true. What I’d hopes to convey is that, even if the research is on the wrong track, we can still massively increase the chances of a good outcome, using some of the options I described
That said, I don’t think Starship is a good analogy. We already knew that such a rocket can work in theory, so it was a matter of engineering, experimentation, and making a big organization work. What if a closer analogy to seeing alignment solved was seeing a proof of P=NP this year?
It doesn’t seem credible for AIs to be more aligned with researchers than researchers are aligned with each other, or with the general population.
Maybe that’s ‘gloomy’ but thats no different than how human affairs have progressed since the first tribes were established. From the viewpoint of broader society it’s more of positive development to understand there’s an upper limit for how much alignment efforts can expect to yield. So that resources are allocated properly to their most beneficial usage.