People are underrating making the future go well conditioned on no AI takeover.
This deserves a full post, but for now a quick take: in my opinion, P(no AI takeover) = 75%, P(future goes extremely well | no AI takeover) = 20%, and most of the value of the future is in worlds where it goes extremely well (and comparatively little value comes from locking in a world that’s good-but-not-great).
Under this view, an intervention is good insofar as it affects P(no AI takeover) * P(things go really well | no AI takeover). Suppose that a given intervention can change P(no AI takeover) and/or P(future goes extremely well | no AI takeover). Then the overall effect of the intervention is proportional to ΔP(no AI takeover) * P(things go really well | no AI takeover) + P(no AI takeover) * ΔP(things go really well | no AI takeover).
Plugging in my numbers, this gives us 0.2 * ΔP(no AI takeover) + 0.75 * ΔP(things go really well | no AI takeover).
And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.
(And I don’t think it’s a tractability issue: I think affecting P(things go really well | no AI takeover) is pretty tractable!)
(Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)
I guess that influencing P(future goes extremely well | no AI takeover) maybe pretty hard, and plagued by cluelessness problems. Avoiding AI takeover is a goal that I have at least some confidence is good.
That said, I do wish more people were thinking about to make the future go well. I think my favorite thing to aim for is increasing the probability that we do a Long Reflection, although I haven’t really thought at all about how to do that.
AI pause/stop/slowdown—Gives more time to research both issues and to improve human intelligence/rationality/philosophy which in turn helps with both.
Metaphilosophy and AI philosophical competence—Higher philosophical competence means AIs can help more with alignment research (otherwise such research will be bottlenecked by reliance on humans to solve the philosophical parts of alignment), and also help humans avoid making catastrophic mistakes with their new newfound AI-given powers if no takeover happens.
Also, have you written down a list of potential risks of doing/attempting human intelligence amplification? (See Managing risks while trying to do good and this for context.)
I agree probably more work should go into this space. I think it is substantially less tractable than reducing takeover risk in aggregate, but much more neglected right now. I think work in this space has the capacity to be much more zero sum (among existing actors, avoiding AI takeover is zero sum with respect to the relevant AIs) and thus can be dodgier.
This would require a longer post, but roughly speaking, I’d want the people making the most important decisions about how advanced AI is used once it’s built to be smart, sane, and selfless. (Huh, that was some convenient alliteration.)
Smart: you need to be able to make really important judgment calls quickly. There will be a bunch of actors lobbying for all sorts of things, and you need to be smart enough to figure out what’s most important.
Sane: smart is not enough. For example, I wouldn’t trust Elon Musk with these decisions, because I think that he’d make rash decisions even though he’s smart, and even if he had humanity’s best interests at heart.
Selfless: even a smart and sane actor could curtail the future if they were selfish and opted to e.g. become world dictator.
And so I’m pretty keen on interventions that make it more likely that smart, sane, and selfless people are in a position to make the most important decisions. This includes things like:
Doing research to figure out the best way to govern advanced AI once it’s developed, and then disseminating those ideas.
Helping to positively shape internal governance at the big AI companies (I don’t have concrete suggestions in this bucket, but like, whatever led to Anthropic having a Long Term Benefit Trust, and whatever could have led to OpenAI’s non-profit board having actual power to fire the CEO).
Helping to staff governments with competent people.
Helping elect smart, sane, and selfless people to elected positions in governments (see 1, 2).
Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)
I think that (from a risk neutral total utilitarian perspective) the argument still goes through with 90% p(ai takeover). but the difference is that when you condition on no ai takeover the worlds looks weirder (e.g. great power conflict, scaling breaks down, coup has already happened, early brain uploads, aliens) which means:
(1) the worlds are more diverse so the impact of any interventions has greater variance, and less likely to be net positive (even if it’s just as positive in expectation)
(2) your impact is lower because the weird transition event is likely to wash out your intervention
Directionally agree, although not in the details. Come to postagi.org, in my view we are on track to slight majority of people thinling about this gathering there (quality weighted). Also lot of the work is not happening under the AI safety brand, so if you look at just AI safety, you miss a lot.
The reason to work on preventing AI takeover now, as opposed to working on already invented AGI in the future, is the first try problem: if you have unaligned takeover-capable AGI, takover just happens and you don’t get to iterate. The same happens with problem of extremely good future only if you believe that the main surviving scenario is “aligned-with-developer-intention singleton takes over the world very quickly, locking in pre-installed values”. People who believe in such scenario usually have very high p(doom), so I assume you are not one of them.
What exactly prevents your strategy here from being “wait for aligned AGI, ask it how to make future extremely good and save some opportunity cost”?
This reason only makes sense if you expect first person to develop AGI to create singleton which takes over the world and locks in pre-installed values, which, again, I find not very compatible with low p(doom). What prevents scenario “AGI developers look around for a year after creation of AGI and decide that they can do better” if not misaligned takeover and not suboptimal value lock-in?
I think a significant amount of the probability mass within P(no AI takeover) is in various AI fizzle worlds. In those worlds, anyone outside AI safety who is working on making the world better, is working to increase the flourishing associated with those worlds.
I think part of the difficulty is it’s not easy to imagine or predict what happens in “future going really well without AI takeover”. Assuming AI will still exist and make progress, humans would probably have to change drastically (in lifestyle if not body/mind) to stay relevant, and it’d be hard to predict what that would be like and whether specific changes are a good idea, unless you don’t think things going really well requires human relevance.
Edit: in contrast, as others said, avoiding AI takeover is a clearer goal and has clearer paths and endpoints. “Future” going well is a potentially indefinitely long time, hard to quantify or coordinate over or even have a consensus on what is even desirable.
People are underrating making the future go well conditioned on no AI takeover.
This deserves a full post, but for now a quick take: in my opinion, P(no AI takeover) = 75%, P(future goes extremely well | no AI takeover) = 20%, and most of the value of the future is in worlds where it goes extremely well (and comparatively little value comes from locking in a world that’s good-but-not-great).
Under this view, an intervention is good insofar as it affects P(no AI takeover) * P(things go really well | no AI takeover). Suppose that a given intervention can change P(no AI takeover) and/or P(future goes extremely well | no AI takeover). Then the overall effect of the intervention is proportional to ΔP(no AI takeover) * P(things go really well | no AI takeover) + P(no AI takeover) * ΔP(things go really well | no AI takeover).
Plugging in my numbers, this gives us 0.2 * ΔP(no AI takeover) + 0.75 * ΔP(things go really well | no AI takeover).
And yet, I think that very little AI safety work focuses on affecting P(things go really well | no AI takeover). Probably Forethought is doing the best work in this space.
(And I don’t think it’s a tractability issue: I think affecting P(things go really well | no AI takeover) is pretty tractable!)
(Of course, if you think P(AI takeover) is 90%, that would probably be a crux.)
Graphic from Forethought’s Better Futures series:
Oh yup, thanks, this does a good job of illustrating my point. I hadn’t seen this graphic!
I guess that influencing P(future goes extremely well | no AI takeover) maybe pretty hard, and plagued by cluelessness problems. Avoiding AI takeover is a goal that I have at least some confidence is good.
That said, I do wish more people were thinking about to make the future go well. I think my favorite thing to aim for is increasing the probability that we do a Long Reflection, although I haven’t really thought at all about how to do that.
You can also work on things that help with both:
AI pause/stop/slowdown—Gives more time to research both issues and to improve human intelligence/rationality/philosophy which in turn helps with both.
Metaphilosophy and AI philosophical competence—Higher philosophical competence means AIs can help more with alignment research (otherwise such research will be bottlenecked by reliance on humans to solve the philosophical parts of alignment), and also help humans avoid making catastrophic mistakes with their new newfound AI-given powers if no takeover happens.
Human intelligence amplification
BTW, have you see my recent post Trying to understand my own cognitive edge, especially the last paragraph?
Also, have you written down a list of potential risks of doing/attempting human intelligence amplification? (See Managing risks while trying to do good and this for context.)
I haven’t seen your stuff, I’ll try to check it out nowish (busy with Inkhaven). Briefly (IDK which things you’ve seen):
My most direct comments are here: https://x.com/BerkeleyGenomic/status/1909101431103402245
I’ve written a fair bit about possible perils of germline engineering (aiming extremely for breadth without depth, i.e. just trying to comprehensively mention everything). Some of them apply generally to HIA. https://berkeleygenomics.org/articles/Potential_perils_of_germline_genomic_engineering.html
My review of HIA discusses some risks (esp. value drift), though not in much depth: https://www.lesswrong.com/posts/jTiSWHKAtnyA723LE/overview-of-strong-human-intelligence-amplification-methods
I agree probably more work should go into this space. I think it is substantially less tractable than reducing takeover risk in aggregate, but much more neglected right now. I think work in this space has the capacity to be much more zero sum (among existing actors, avoiding AI takeover is zero sum with respect to the relevant AIs) and thus can be dodgier.
Elaborate on what you see as the main determining features making a future go extremely well VS okay? And what interventions are tractable?
This would require a longer post, but roughly speaking, I’d want the people making the most important decisions about how advanced AI is used once it’s built to be smart, sane, and selfless. (Huh, that was some convenient alliteration.)
Smart: you need to be able to make really important judgment calls quickly. There will be a bunch of actors lobbying for all sorts of things, and you need to be smart enough to figure out what’s most important.
Sane: smart is not enough. For example, I wouldn’t trust Elon Musk with these decisions, because I think that he’d make rash decisions even though he’s smart, and even if he had humanity’s best interests at heart.
Selfless: even a smart and sane actor could curtail the future if they were selfish and opted to e.g. become world dictator.
And so I’m pretty keen on interventions that make it more likely that smart, sane, and selfless people are in a position to make the most important decisions. This includes things like:
Doing research to figure out the best way to govern advanced AI once it’s developed, and then disseminating those ideas.
Helping to positively shape internal governance at the big AI companies (I don’t have concrete suggestions in this bucket, but like, whatever led to Anthropic having a Long Term Benefit Trust, and whatever could have led to OpenAI’s non-profit board having actual power to fire the CEO).
Helping to staff governments with competent people.
Helping elect smart, sane, and selfless people to elected positions in governments (see 1, 2).
I think that (from a risk neutral total utilitarian perspective) the argument still goes through with 90% p(ai takeover). but the difference is that when you condition on no ai takeover the worlds looks weirder (e.g. great power conflict, scaling breaks down, coup has already happened, early brain uploads, aliens) which means:
(1) the worlds are more diverse so the impact of any interventions has greater variance, and less likely to be net positive (even if it’s just as positive in expectation)
(2) your impact is lower because the weird transition event is likely to wash out your intervention
Directionally agree, although not in the details. Come to postagi.org, in my view we are on track to slight majority of people thinling about this gathering there (quality weighted). Also lot of the work is not happening under the AI safety brand, so if you look at just AI safety, you miss a lot.
I want to say “Debate or update!”, but I’m not necessarily personally offering / demanding to debate. I would want there to be some way to say that though. I don’t think this is a “respectable” position, for the meaning gestured at here: https://www.lesswrong.com/posts/7xCxz36Jx3KxqYrd9/plan-1-and-plan-2?commentId=Pfqxj66S98KByEnTp
(Unless you mean you think P(AGI within 50 years < 30%), which would be respectable, but I don’t think you mean that.)
The reason to work on preventing AI takeover now, as opposed to working on already invented AGI in the future, is the first try problem: if you have unaligned takeover-capable AGI, takover just happens and you don’t get to iterate. The same happens with problem of extremely good future only if you believe that the main surviving scenario is “aligned-with-developer-intention singleton takes over the world very quickly, locking in pre-installed values”. People who believe in such scenario usually have very high p(doom), so I assume you are not one of them.
What exactly prevents your strategy here from being “wait for aligned AGI, ask it how to make future extremely good and save some opportunity cost”?
People might not instruct the AI to make the future extremely good, where “good” means actually good.
This reason only makes sense if you expect first person to develop AGI to create singleton which takes over the world and locks in pre-installed values, which, again, I find not very compatible with low p(doom). What prevents scenario “AGI developers look around for a year after creation of AGI and decide that they can do better” if not misaligned takeover and not suboptimal value lock-in?
I think a significant amount of the probability mass within P(no AI takeover) is in various AI fizzle worlds. In those worlds, anyone outside AI safety who is working on making the world better, is working to increase the flourishing associated with those worlds.
Is your assumption true though? To what degree are people focused on takeover in your view?
Most formal, technical AI safety work, seems to be about gradual improvements and is being made by people who assume no takeover is likely.
I think part of the difficulty is it’s not easy to imagine or predict what happens in “future going really well without AI takeover”. Assuming AI will still exist and make progress, humans would probably have to change drastically (in lifestyle if not body/mind) to stay relevant, and it’d be hard to predict what that would be like and whether specific changes are a good idea, unless you don’t think things going really well requires human relevance.
Edit: in contrast, as others said, avoiding AI takeover is a clearer goal and has clearer paths and endpoints. “Future” going well is a potentially indefinitely long time, hard to quantify or coordinate over or even have a consensus on what is even desirable.