How uncompetitive do you think aligned IDA agents will be relative to unaligned agents
For the sake of this estimate I’m using a definition of IDA that is probably narrower than what Paul has in mind: in the definition I use here, the Distill steps are carried out by nothing other than supervised learning + what it takes to make that supervised learning safe (but the implementation of the Distill steps may be improved during the Amplify steps).
This narrow definition might not include the most promising future directions of IDA (e.g. maybe the Distill steps should be carried out by some other process that involves humans). Without this simplifying assumption, one might define IDA as broadly as: “iteratively create stronger and stronger safe AI systems by using all the resources and tools that you currently have”. Carrying out that Broad IDA approach might include efforts like asking AI alignment researchers to get into a room with a whiteboard and come up with ideas for new approaches.
Therefor this estimate uses my narrow definition of IDA. If you like, I can also answer the general question: “How uncompetitive do you think aligned agents will be relative to unaligned agents?”.
My estimate:
Suppose it is the case that if OpenAI decided to create an AGI agent as soon as they could, it would have taken them X years (assuming an annual budget of $10M and that the world around them stays the same, and OpenAI doesn’t do neuroscience, and no unintentional disasters happen).
Now suppose that OpenAI decided to create an aligned IDA agent with AGI capabilities as soon as they could (same conditions). How much time would it take them? My estimate follows; each entry is in the format:
[years]: [my credence that it would take them at most that many years]
(consider writing down your own credences before looking at mine)
For the sake of this estimate I’m using a definition of IDA that is probably narrower than what Paul has in mind: in the definition I use here, the Distill steps are carried out by nothing other than supervised learning + what it takes to make that supervised learning safe (but the implementation of the Distill steps may be improved during the Amplify steps).
This narrow definition might not include the most promising future directions of IDA (e.g. maybe the Distill steps should be carried out by some other process that involves humans). Without this simplifying assumption, one might define IDA as broadly as: “iteratively create stronger and stronger safe AI systems by using all the resources and tools that you currently have”. Carrying out that Broad IDA approach might include efforts like asking AI alignment researchers to get into a room with a whiteboard and come up with ideas for new approaches.
Therefor this estimate uses my narrow definition of IDA. If you like, I can also answer the general question: “How uncompetitive do you think aligned agents will be relative to unaligned agents?”.
My estimate:
Suppose it is the case that if OpenAI decided to create an AGI agent as soon as they could, it would have taken them X years (assuming an annual budget of $10M and that the world around them stays the same, and OpenAI doesn’t do neuroscience, and no unintentional disasters happen).
Now suppose that OpenAI decided to create an aligned IDA agent with AGI capabilities as soon as they could (same conditions). How much time would it take them? My estimate follows; each entry is in the format:
[years]: [my credence that it would take them at most that many years]
(consider writing down your own credences before looking at mine)
1.0X:
0.1%
1.1X:
3%
1.2X:
3%
1.5X:
4%
2X:
5%
5X:
10%
10X:
30%
100X:
60%