While I agree this would help with misalignment risk, I think it’s going to be hard to convince the government officials in charge of passing the rules to enforce this that it is a good idea. As you yourself mention, “It’s not clear how relevant next-token prediction is to understanding all of the important facts about models.”
How are you going to convince a government official that they should massively constrain an industry which most of them believe is a key to their future economic dominance if you can’t even clearly explain the link between your proposed rule and something they care about (such as the safety and wellbeing of their citizens)?
In my mind a better set of benchmarks would more be like red-teaming, i.e. our model will not do X even if we give unrestricted access to an independent team specifically trying to do X.
Ideally if X is something dangerous (i.e. self-replicate and spread across the internet), there would be extremely strict security involved in the test so that testing such capabilities could not in and of itself cause the thing that we’re hoping to prevent.
Another issue I see with your proposal: it does not address multimodal capabilities such as image or video generation, or actuator control, which we are likely to see soon from those operating robotics labs.
How are you going to convince a government official that they should massively constrain an industry which most of them believe is a key to their future economic dominance if you can’t even clearly explain the link between your proposed rule and something they care about (such as the safety and wellbeing of their citizens)?
I agree. Part of the point of this post is to explore the relationship between predicting model outputs and understanding the important facts about those models.
In my mind a better set of benchmarks would more be like red-teaming, i.e. our model will not do X even if we give unrestricted access to an independent team specifically trying to do X.
I think we should do that, too.
Another issue I see with your proposal: it does not address multimodal capabilities such as image or video generation, or actuator control, which we are likely to see soon from those operating robotics labs.
You can easily adopt this proposal to those modalities, AFAICT.
Don’t forget you have to convince all the governments who have significant quantities of nuclear weapons to agree to this. So that’s at a minimum the USA, UK, EU, China, Russia, Israel. All of them have the ability to retaliate with catastrophic, nation ending destruction if they are militarily attacked. Each of them can choose what level of AI capabilities and restriction they are comfortable with.
If in fact substantially stronger models are useful because they make possible a game changing new technology (assume for a minute that the model does it’s jobs and the humans who funded it in this country have the tech), then all the governments have to agree to forgo such technology.
Theoretically there’s a few technologies that would be so powerful that every nation without it would be on a doomsday clock to be disempowered. General robotics would do this, if one nation alone had fully general robotics, they could use it to gain exponential amounts of resources and weapons and then attack and defeat everyone else. Nanotechnology would do something similar. Age reversal medicine would allow them to make more money than OPEC if they have the only licenses to it. (age reversal medicine would likely be extremely complex sets of surgeries and drug and gene editing therapies that can only be administered by an ASI)
So it becomes a choice between “adopt more powerful ASI or lose all sovereignty”. Sure, a country without ASI might have nukes and a window of time where they will still work, but that’s not a better choice, that’s a choice between “loss of all sovereignty and death of majority of population” or “loss of sovereignty”. Because if you nuke a nuclear power, they return fire.
Right now, there is no empirical evidence that ASI is even harmful. It’s never been observed, and current models are obviously quite containable. So to prevent ASI from being developed, someone either needs to build early ones and prove they are dangerous, or it’s unlikely every relevant government will be convinced.
Therefore the OPs proposal is probably not viable.
While I agree this would help with misalignment risk, I think it’s going to be hard to convince the government officials in charge of passing the rules to enforce this that it is a good idea. As you yourself mention, “It’s not clear how relevant next-token prediction is to understanding all of the important facts about models.”
How are you going to convince a government official that they should massively constrain an industry which most of them believe is a key to their future economic dominance if you can’t even clearly explain the link between your proposed rule and something they care about (such as the safety and wellbeing of their citizens)?
In my mind a better set of benchmarks would more be like red-teaming, i.e. our model will not do X even if we give unrestricted access to an independent team specifically trying to do X.
Ideally if X is something dangerous (i.e. self-replicate and spread across the internet), there would be extremely strict security involved in the test so that testing such capabilities could not in and of itself cause the thing that we’re hoping to prevent.
Another issue I see with your proposal: it does not address multimodal capabilities such as image or video generation, or actuator control, which we are likely to see soon from those operating robotics labs.
I agree. Part of the point of this post is to explore the relationship between predicting model outputs and understanding the important facts about those models.
I think we should do that, too.
You can easily adopt this proposal to those modalities, AFAICT.
Don’t forget you have to convince all the governments who have significant quantities of nuclear weapons to agree to this. So that’s at a minimum the USA, UK, EU, China, Russia, Israel. All of them have the ability to retaliate with catastrophic, nation ending destruction if they are militarily attacked. Each of them can choose what level of AI capabilities and restriction they are comfortable with.
If in fact substantially stronger models are useful because they make possible a game changing new technology (assume for a minute that the model does it’s jobs and the humans who funded it in this country have the tech), then all the governments have to agree to forgo such technology.
Theoretically there’s a few technologies that would be so powerful that every nation without it would be on a doomsday clock to be disempowered. General robotics would do this, if one nation alone had fully general robotics, they could use it to gain exponential amounts of resources and weapons and then attack and defeat everyone else. Nanotechnology would do something similar. Age reversal medicine would allow them to make more money than OPEC if they have the only licenses to it. (age reversal medicine would likely be extremely complex sets of surgeries and drug and gene editing therapies that can only be administered by an ASI)
So it becomes a choice between “adopt more powerful ASI or lose all sovereignty”. Sure, a country without ASI might have nukes and a window of time where they will still work, but that’s not a better choice, that’s a choice between “loss of all sovereignty and death of majority of population” or “loss of sovereignty”. Because if you nuke a nuclear power, they return fire.
Right now, there is no empirical evidence that ASI is even harmful. It’s never been observed, and current models are obviously quite containable. So to prevent ASI from being developed, someone either needs to build early ones and prove they are dangerous, or it’s unlikely every relevant government will be convinced.
Therefore the OPs proposal is probably not viable.