Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at “only somewhat autonomous”? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?
Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?
Right, but you also want to implement a red line on a system that would be precursors to this type of system, and this is why we have a red line on self-improvement.
So, in practice, what might that look like?
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at “only somewhat autonomous”? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?
Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?