A thought: the bulk of the existential risk we face from AI is likely to be from smarter-than-human systems. At a governance level, I hear people pushing for things like:
Implement safety checks
Avoid race dynamics
Shut it down
but not
Prohibit smarter-than-human systems
Why not? It seems like a) a particularly clear and bright line to draw[1], b) something that a huge amount of the public would likely support, and c) probably(?) easy to pass because most policymakers imagine this to be in the distant future. The biggest downside I immediately see is that it sounds sufficiently sci-fi-ish that it might be hard to get policymakers to take seriously. It certainly wouldn’t eliminate all the risk! But it seems to me like it would reduce it significantly, and we could still continue to push for tighter constraints afterward.
Clear in theory; there are certainly practical complications, eg on what percent of what list of capabilities does a system have to be stronger than human to cross the line? But it’s conceptually very clear.
Useful bit of info on that topic: per a YouGov poll of 1118 American voters in September 2023, 63% agree that ‘Yes, regulation should aim to actively prevent AI superintelligence’ (vs 16% disagree, 21% don’t know). Vox story, poll info, crosstabs.
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow… for example, they will try to make their AI do worse of the official government benchmarks but better at things their users care about. Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?
I think the proposals of limiting large training runs past a certain threshold are attempting to do exactly this. It might be better to make the criteria about cognitive performance vs. computation, but it is harder to define and therefore enforce. It does seem intuitively like this would be a better restriction, though. Debating cognitive benchmarks is vague, but if they’re far exceeded it might become obvious.
I’ve thought vaguely about attempting to restrict the amount of reflection/self-awareness, solving novel problems (see Jacques’ short take on the Chollet interview, which I think is quite correct as far as it goes; LLMs can’t solve truly novel problems without new capabilities/scaffolding, which I think will be pretty easy but not trivial), or similar criteria. You’d have to define “smarter than human” carefully, since many AI systems are already smarter than humans in specific tasks.
All of these would probably be ignored in private, but it would at least prevent hasty public release of overthrow-capable agents.
It might be better to make the criteria about cognitive performance vs. computation, but it is harder to define and therefore enforce
Agreed that there’s a lot more detail that would have to be nailed down to do it this way. I think one big advantage to defining it by cognitive performance is to make it clearer to the general public. “Was trained using more than 10^26 FLOPS” doesn’t mean anything at all to most people (and doesn’t relate to capabilities for anyone who hasn’t investigated that exact relationship). “Is smarter than human” is very intuitively clear to most people (I think?) and so it may be easier to coordinate around.
Excellent point. It’s a far better movement slogan. So even if you wanted to turn it into a compute limit, that should be how the goal is framed.
I also wonder about replacing “intelligence” with “competence”. Lots of people now say “intelligent at what? They’ve beaten us at chess forever and that’s fine”. You can do the same thing with competence, but the instinct hasn’t developed. And the simple answer is “competent at taking over the world”.
Clarification: I don’t strongly believe that this is the right line to try to draw; it just seems like one useful candidate, which makes me surprised that I haven’t heard it discussed, and curious whether that’s due to some fundamental flaw.
A thought: the bulk of the existential risk we face from AI is likely to be from smarter-than-human systems. At a governance level, I hear people pushing for things like:
Implement safety checks
Avoid race dynamics
Shut it down
but not
Prohibit smarter-than-human systems
Why not? It seems like a) a particularly clear and bright line to draw[1], b) something that a huge amount of the public would likely support, and c) probably(?) easy to pass because most policymakers imagine this to be in the distant future. The biggest downside I immediately see is that it sounds sufficiently sci-fi-ish that it might be hard to get policymakers to take seriously. It certainly wouldn’t eliminate all the risk! But it seems to me like it would reduce it significantly, and we could still continue to push for tighter constraints afterward.
Clear in theory; there are certainly practical complications, eg on what percent of what list of capabilities does a system have to be stronger than human to cross the line? But it’s conceptually very clear.
Useful bit of info on that topic: per a YouGov poll of 1118 American voters in September 2023, 63% agree that ‘Yes, regulation should aim to actively prevent AI superintelligence’ (vs 16% disagree, 21% don’t know). Vox story, poll info, crosstabs.
The companies will have an incentive to make an AI slightly smarter than their competition. And if there is a law against it, they will try to hack it somehow… for example, they will try to make their AI do worse of the official government benchmarks but better at things their users care about. Or perhaps make an AI with IQ 200 and tell it to act like it has IQ 100 when it suspects it is doing a government test.
Being investigated these days as ‘sandbagging’; there’s a good new paper on that from some of my MATS colleagues.
Agree but that’s true of regulation in general. Do you think it’s unusually true of regulation along these lines, vs eg existing eval approaches like METR’s?
I think this is a correct policy goal to coordinate around, and I see momentum around it building.
I think the proposals of limiting large training runs past a certain threshold are attempting to do exactly this. It might be better to make the criteria about cognitive performance vs. computation, but it is harder to define and therefore enforce. It does seem intuitively like this would be a better restriction, though. Debating cognitive benchmarks is vague, but if they’re far exceeded it might become obvious.
I’ve thought vaguely about attempting to restrict the amount of reflection/self-awareness, solving novel problems (see Jacques’ short take on the Chollet interview, which I think is quite correct as far as it goes; LLMs can’t solve truly novel problems without new capabilities/scaffolding, which I think will be pretty easy but not trivial), or similar criteria. You’d have to define “smarter than human” carefully, since many AI systems are already smarter than humans in specific tasks.
All of these would probably be ignored in private, but it would at least prevent hasty public release of overthrow-capable agents.
Agreed that there’s a lot more detail that would have to be nailed down to do it this way. I think one big advantage to defining it by cognitive performance is to make it clearer to the general public. “Was trained using more than 10^26 FLOPS” doesn’t mean anything at all to most people (and doesn’t relate to capabilities for anyone who hasn’t investigated that exact relationship). “Is smarter than human” is very intuitively clear to most people (I think?) and so it may be easier to coordinate around.
Excellent point. It’s a far better movement slogan. So even if you wanted to turn it into a compute limit, that should be how the goal is framed.
I also wonder about replacing “intelligence” with “competence”. Lots of people now say “intelligent at what? They’ve beaten us at chess forever and that’s fine”. You can do the same thing with competence, but the instinct hasn’t developed. And the simple answer is “competent at taking over the world”.
My initial intuition is that “more competent than humans” won’t resonate as much as “smarter than humans” but that’s just a guess.
Clarification: I don’t strongly believe that this is the right line to try to draw; it just seems like one useful candidate, which makes me surprised that I haven’t heard it discussed, and curious whether that’s due to some fundamental flaw.