Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”
If they’re that smart, why will they need to be persuaded?
Because they might consider that other problems are more worth their time, since smartness changes change their values little.
And maybe they believe that AI alignment isn’t impactful for technical/epistemic reasons.
I’m confused/surprised I need to make this point, because I don’t automatically think they will be persuaded that AI alignment is a big problem they will need to work on, and some effort will likely still need to be required.
I mean if they care about solving problems at all, and we are in fact correct about AGI ruin, then they should predictably come to view it as the most important problem and start to work on it?
Are you imagining they’re super myopic or lazy and just want to think about math puzzles or something? If so, my reply is that even if some of them ended up like that, I’d be surprised if they all ended up like that, and if so that would be a failure of the enhancement. The aim isn’t to create people who we will then carefully persuade to work on the problem, the aim is for some of them to be smart + caring + wise enough to see the situation we’re in and decide for themselves to take it on.
More so that I’m imagining they might not even have heard of the argument, and it’s helpful to note that people like Terence Tao, Timothy Gowers and more are all excellent people at their chosen fields, but most people that have a big impact on the world don’t go into AI alignment.
Remember, superintelligence is not omniscience.
So I don’t expect them to be self motivated to work on this specific problem without at least a little persuasion.
I’d expect a few superintelligent adults to join alignment efforts, but nowhere near thousands or tens of thousands, and I’d upper bound it at 300-500 new researchers at most in 15-25 years.
Much less impactful than automating AI safety.
I don’t think this will work.
How much probability do you assign to automating AI safety not working in time? Because I believe the preparing to automate AI safety work is probably the highest-value in pure ability to reduce X-risk probability, assuming it does work, so I assign much higher EV to automating AI safety, relative to other approaches.
I think I’m at <10% that non-enhanced humans will be able to align ASI in time, and if I condition on them succeeding somehow I don’t think it’s because they got AIs to do it for them. Like maybe you can automate some lower level things that might be useful (e.g. specific interpretability experiments), but at the end of the day someone has to understand in detail how the outcome is being steered or they’re NGMI. Not sure exactly what you mean by “automating AI safety”, but I think stronger forms of the idea are incoherent (e.g. “we’ll just get AI X to figure it all out for us” has the problem of requiring X to be aligned in the first place).
As far as what a plan to automate AI safety would work out in practice, assuming a relatively strong version of the concept is in this post below, and there will be another post that comes out by the same author talking more about the big risks discussed in the comments below:
https://www.lesswrong.com/posts/TTFsKxQThrqgWeXYJ/how-might-we-safely-pass-the-buck-to-ai
In general, I think the crux is that in most timelines (at a lower bound, 65-70%) that have AGI developed relatively soon (so timelines from 2030-2045, roughly), and the alignment problem isn’t solvable by default/it’s at least non-trivially tricky to solved, conditioning on alignment success looks more like “we’ve successfully figured out how to prepare for AI automation of everything, and we managed to use alignment and control techniques well enough that we can safely pass most of the effort to AI”, rather than other end states like “humans are deeply enhanced” or “lawmakers actually coordinated to pause AI, and are actually giving funding to alignment organizations such that we can make AI safe.”