People sometimes talk about “alignment by default” — the idea that we might solve alignment without any special effort beyond what we’d ordinarily do. I think it’s useful to decompose this into three theses, sorted from strong to weak:
Alignment by Default Techniques. Ordinary techniques for training and deploying AIs — e.g. labelling data to the best of their ability, using whatever tools are available (including earlier LLMs) — are sufficient to produce aligned AI. No special techniques are required.
Alignment by Default Market. Maybe default techniques aren’t enough, but ordinary market incentives are. Companies competing to build useful, reliable, non-harmful products — following standard commercial pressures without any special coordination or regulation — end up solving alignment as a byproduct of building products people actually want to use. No government intervention is required.
Alignment by Default Government. Maybe market incentives alone aren’t enough, but conventional policy interventions are. Governments applying familiar regulatory tools (liability law, safety standards, auditing requirements) in the ordinary way are sufficient to close the gap.. No unprecedented governance or coordination are required.
My rough credences:
Default Techniques sufficient: ~15%
Default Market sufficient (given training isn’t): ~30%
Default Government sufficient (given market isn’t): ~20%
Need something more unusual: ~35%
These are rough and the categories blur into each other, but the decomposition seems useful for locating where exactly you think the hard problem lies.
I’m not sure why you think that market incentives such as customer preference are ~3x more likely to find techniques that work than default incentives such as “we don’t want these things to kill us”.
The lowest level techniques in your list are being applied by researchers who still have the incentive to create AGI that won’t kill themselves and others, even in the absence of market forces or government enforcement. You give this a 15% credence of being sufficient. Then your estimate for adding market incentives to that yields an additional 30% credence (for a total of 45%) of being sufficient.
I think it is more likely default techniques are sufficent then default market or government is sufficent. Markets don’t incentives non-harmful products, regulation does. Regulation can be slow. If you believe in a rapid intelligence explosion it seems there is a high chance there is not sufficent market regulation. On the other hand, our morals are mostly evolved, so you can imagine that an AI that understands things in the same regard as we do shares our same morals.
People sometimes talk about “alignment by default” — the idea that we might solve alignment without any special effort beyond what we’d ordinarily do. I think it’s useful to decompose this into three theses, sorted from strong to weak:
Alignment by Default Techniques. Ordinary techniques for training and deploying AIs — e.g. labelling data to the best of their ability, using whatever tools are available (including earlier LLMs) — are sufficient to produce aligned AI. No special techniques are required.
Alignment by Default Market. Maybe default techniques aren’t enough, but ordinary market incentives are. Companies competing to build useful, reliable, non-harmful products — following standard commercial pressures without any special coordination or regulation — end up solving alignment as a byproduct of building products people actually want to use. No government intervention is required.
Alignment by Default Government. Maybe market incentives alone aren’t enough, but conventional policy interventions are. Governments applying familiar regulatory tools (liability law, safety standards, auditing requirements) in the ordinary way are sufficient to close the gap.. No unprecedented governance or coordination are required.
My rough credences:
Default Techniques sufficient: ~15%
Default Market sufficient (given training isn’t): ~30%
Default Government sufficient (given market isn’t): ~20%
Need something more unusual: ~35%
These are rough and the categories blur into each other, but the decomposition seems useful for locating where exactly you think the hard problem lies.
I’m not sure why you think that market incentives such as customer preference are ~3x more likely to find techniques that work than default incentives such as “we don’t want these things to kill us”.
I’m not seeing how you are drawing that from my numbers
The lowest level techniques in your list are being applied by researchers who still have the incentive to create AGI that won’t kill themselves and others, even in the absence of market forces or government enforcement. You give this a 15% credence of being sufficient. Then your estimate for adding market incentives to that yields an additional 30% credence (for a total of 45%) of being sufficient.
I think it is more likely default techniques are sufficent then default market or government is sufficent. Markets don’t incentives non-harmful products, regulation does. Regulation can be slow. If you believe in a rapid intelligence explosion it seems there is a high chance there is not sufficent market regulation. On the other hand, our morals are mostly evolved, so you can imagine that an AI that understands things in the same regard as we do shares our same morals.