There’s an assumption that you need a single agent to lead to existential risk. This is not the case, and many scenarios explored require only competent and autonomous service like AIs, or foundations models. Like, CAIS is a model of intelligence explosion and has existential risks type failure modes too.
There’s an assumption that just because the non AGI models are useful, labs will stop pursuing AGI. Yet this is visibly false, as the meme of AGI is running around and there are multiple labs who are explicitly pushing for AGI and getting the financial leeway to do it.
More generally, this post has the typical problem of “here is a scenario that looks plausible and would be nice, so there’s no need to worry”. Sure, maybe this is the actual scenario that will come to pass, and maybe it’s possible to argue for it convincingly. But you should require one damn strong argument before pushing people to not even work to deal with the many more possible numerous worlds where things go horribly wrong.
I do think re: (2), whether such labs actually are amassing “the financial leeway to [build AGI before simpler models can be made profitable]” is somewhat a function of your beliefs about timelines. If it only takes $100M to build AGI, I agree that labs will do it just for the trophy, but if it takes $1B I think it is meaningfully less likely (though not out of the question) that that much money would be allocated to a single research project, conditioned on $100M models having so far failed to be commercializable.
I mostly agree with points 1 and 2. Many interacting AI’s are important to consider, and I think individual AI’s will have an incentive to multiply. And I agree that people will continue to push the capability frontier beyond what is directly useful, potentially giving AI dangerous abilities.
I think we differ on the timeline of those changes, not whether those changes are possible or important. This is the question I’m trying to highlight.
Longer timelines for dangerous AI don’t mean that we shouldn’t prepare for things going wrong; these problems will still need to be solved eventually. Ideally, some researchers would act as if timelines are really short and “roll the dice” on alignment research that could pan out in a few years.
But I’m arguing that the bulk of the field should feel safe investing in infrastructure, movement building, and projects over a 10-20 year timeframe. People considering pursuing AI safety should assume they will have enough time to make an impact. Finding ways to extend this horizon further into the future is valuable because it gives the field more time to grow.
I see at least two problems with your argument:
There’s an assumption that you need a single agent to lead to existential risk. This is not the case, and many scenarios explored require only competent and autonomous service like AIs, or foundations models. Like, CAIS is a model of intelligence explosion and has existential risks type failure modes too.
There’s an assumption that just because the non AGI models are useful, labs will stop pursuing AGI. Yet this is visibly false, as the meme of AGI is running around and there are multiple labs who are explicitly pushing for AGI and getting the financial leeway to do it.
More generally, this post has the typical problem of “here is a scenario that looks plausible and would be nice, so there’s no need to worry”. Sure, maybe this is the actual scenario that will come to pass, and maybe it’s possible to argue for it convincingly. But you should require one damn strong argument before pushing people to not even work to deal with the many more possible numerous worlds where things go horribly wrong.
Agree with (1) and (~3).
I do think re: (2), whether such labs actually are amassing “the financial leeway to [build AGI before simpler models can be made profitable]” is somewhat a function of your beliefs about timelines. If it only takes $100M to build AGI, I agree that labs will do it just for the trophy, but if it takes $1B I think it is meaningfully less likely (though not out of the question) that that much money would be allocated to a single research project, conditioned on $100M models having so far failed to be commercializable.
I mostly agree with points 1 and 2. Many interacting AI’s are important to consider, and I think individual AI’s will have an incentive to multiply. And I agree that people will continue to push the capability frontier beyond what is directly useful, potentially giving AI dangerous abilities.
I think we differ on the timeline of those changes, not whether those changes are possible or important. This is the question I’m trying to highlight.
Longer timelines for dangerous AI don’t mean that we shouldn’t prepare for things going wrong; these problems will still need to be solved eventually. Ideally, some researchers would act as if timelines are really short and “roll the dice” on alignment research that could pan out in a few years.
But I’m arguing that the bulk of the field should feel safe investing in infrastructure, movement building, and projects over a 10-20 year timeframe. People considering pursuing AI safety should assume they will have enough time to make an impact. Finding ways to extend this horizon further into the future is valuable because it gives the field more time to grow.