Applying the idea is tricky and context-dependent. For example, gathering evidence for scheming seems unambiguously good, but actually solving scheming could be bad (unless you’re sure that such evidence can’t be gathered, or companies will not gate on this problem regardless), because some time in the future, it may well become legible enough to be gating deployment. (Also keep in mind that it’s not just legibility/gating by the companies, but also by other policymakers such as voters and politicians.)
Given the tradeoffs apparent to me (including that the benefits of solving scheming are limited by other safety problems), I think it may well be an example of a safety problem that is net negative to work on, and something I wouldn’t want to do myself. But I’m unsure how to argue for this convincingly (and also am just not certain enough to want to talk other people out of working on this specifically) which is why I’m only talking about it in response to your comment.
FWIW, on my views, work to prevent scheming looks pretty clearly great. Pausing to wait for a solution to scheming doesn’t seem super likely, and going from [scheming models widely deployed] –> [non-scheming models widely deployed] seems significantly more valuable than going from [non-scheming models widely deployed] –> [temporary pause to solve scheming].
A lot of the listed topics here are problems that we could have plenty of time to work on after the singularity. I’m sympathetic to arguments that bad things might get locked-in, but I don’t really think the arguments for this have a disjunctive nature where we’re very likely to run into at least one type of bad lock-in. There’s just a decent chance that we do an ok job of developing AIs and handing over to a society that’s more capable than us at dealing with these issues (not a super high bar), in which case a pause wouldn’t add much. (The arguments that make me feel most pessimistic about the future are arguments that humans might just not be motivated to do good things — but it’s not clear why pauses would help much with that issue.)
There’s just a decent chance that we do an ok job of developing AIs and handing over to a society that’s more capable than us at dealing with these issues (not a super high bar), in which case a pause wouldn’t add much.
The aim of a pause would be to plan out the transition better, or make humans smarter/wiser so they can navigate the transition better, so that we end up handing over remaining problems to a counterfactually more capable society. In other words, the bar shouldn’t be “more capable than us” but a society that could realistically be achieved with a pause.
The arguments that make me feel most pessimistic about the future are arguments that humans might just not be motivated to do good things — but it’s not clear why pauses would help much with that issue.
One issue related to this is that humans today largely want to do good things as a side effect of virtue signaling / status games that they’re doing/playing. This is currently far from optimal, which makes me scared to undergo an AI transition that could potentially lock-in such highly suboptimal motivations/values, and also scared that the AI transition could just scramble or reset these status games and remove what good motivations/values we do have. A pause would preserve the status quo and give people more time to think about such issues (including time for the idea to spread), and potentially find ways to make the AI transition go better in these regards (compared to today when there has been almost no thought on these issues at all).
But see also this recent quick take where I expressed that my optimism about a pause is pretty limited.
The aim of a pause would be to plan out the transition better, or make humans smarter/wiser so they can navigate the transition better, so that we end up handing over remaining problems to a counterfactually more capable society. In other words, the bar shouldn’t be “more capable than us” but a society that could realistically be achieved with a pause
If the society is “more capable than us” in some average sense, where we still have certain advantages over them, then I agree that we could still contribute things.
If the society is “more capable (and good) than us” in all the important ways, then they’d also be better at making themselves smarter/wise than we would have been, and better at handling the transition, so further pauses really wouldn’t have contributed much.
Idk, I don’t know particularly want to argue about definitions here. I just think there’s a decent chance that I’ll look back after the singularity and be like “yep, the sloppy transition sure meant that we took on a bunch of ex-ante risk, but since we got lucky, extra pause time wouldn’t have helped vis-a-vis the long-run lock-in issues. Anything they could have done to help is stuff we can do better now.” (And/or: Marginal pause time may have been good or bad via various values or power changes, but it wouldn’t have systematically led to improvements from everyone’s perspective by e.g. enabling additional intellectual work, because it turns out it was fine to defer the relevant intellectual work until later.)
If the society is “more capable (and good) than us” in all the important ways, then they’d also be better at making themselves smarter/wise than we would have been, and better at handling the transition, so further pauses really wouldn’t have contributed much.
Even this society, if it’s in the future, then part of the transition would have already occurred, so they won’t have the opportunity to make it go better. So by not pausing now we’d permanently give up this opportunity.
Take the issue in this recent comment, of building an initial AGI that reasons well or poorly about domains that lack fast/cheap feedback signals. It seems very plausible that our long-term civilizational trajectory is significantly affected by which type of AGI gets built first. Suppose we end up building one that reasons poorly about such domains, then:
The post-AGI civilization may end up being less capable (and good) than us on average, or in some important ways.
Even if they’re actually more capable (and good) than us in all the important ways, they could have been even better if only we had built an AGI that reasons well in such domains, but they can’t go back in time and change this.
seems very plausible that our long-term civilizational trajectory is significantly affected by which type of AGI gets built first
I of course agree, but I’d think this would mostly be an issue of capabilities or goodness of our future society, since there’s not much external to our society that’s getting worse as a result of the transition. Anyway, that seems like maybe one of those definitional issues. I think you’re probably right that there’s some possible changes that aren’t well characterized as being about the capabilities or goodness of our society, so an improvemet in those dimensions aren’t strictly speaking sufficient for a pause to not have been valuable.
I care more about my claim that started with “I just think there’s a decent chance...”. (Which is importantly only asserting a decent chance, not saying that there aren’t plausible ways it could be false.)
Applying the idea is tricky and context-dependent. For example, gathering evidence for scheming seems unambiguously good, but actually solving scheming could be bad (unless you’re sure that such evidence can’t be gathered, or companies will not gate on this problem regardless), because some time in the future, it may well become legible enough to be gating deployment. (Also keep in mind that it’s not just legibility/gating by the companies, but also by other policymakers such as voters and politicians.)
Given the tradeoffs apparent to me (including that the benefits of solving scheming are limited by other safety problems), I think it may well be an example of a safety problem that is net negative to work on, and something I wouldn’t want to do myself. But I’m unsure how to argue for this convincingly (and also am just not certain enough to want to talk other people out of working on this specifically) which is why I’m only talking about it in response to your comment.
Gotcha.
FWIW, on my views, work to prevent scheming looks pretty clearly great. Pausing to wait for a solution to scheming doesn’t seem super likely, and going from [scheming models widely deployed] –> [non-scheming models widely deployed] seems significantly more valuable than going from [non-scheming models widely deployed] –> [temporary pause to solve scheming].
A lot of the listed topics here are problems that we could have plenty of time to work on after the singularity. I’m sympathetic to arguments that bad things might get locked-in, but I don’t really think the arguments for this have a disjunctive nature where we’re very likely to run into at least one type of bad lock-in. There’s just a decent chance that we do an ok job of developing AIs and handing over to a society that’s more capable than us at dealing with these issues (not a super high bar), in which case a pause wouldn’t add much. (The arguments that make me feel most pessimistic about the future are arguments that humans might just not be motivated to do good things — but it’s not clear why pauses would help much with that issue.)
The aim of a pause would be to plan out the transition better, or make humans smarter/wiser so they can navigate the transition better, so that we end up handing over remaining problems to a counterfactually more capable society. In other words, the bar shouldn’t be “more capable than us” but a society that could realistically be achieved with a pause.
One issue related to this is that humans today largely want to do good things as a side effect of virtue signaling / status games that they’re doing/playing. This is currently far from optimal, which makes me scared to undergo an AI transition that could potentially lock-in such highly suboptimal motivations/values, and also scared that the AI transition could just scramble or reset these status games and remove what good motivations/values we do have. A pause would preserve the status quo and give people more time to think about such issues (including time for the idea to spread), and potentially find ways to make the AI transition go better in these regards (compared to today when there has been almost no thought on these issues at all).
But see also this recent quick take where I expressed that my optimism about a pause is pretty limited.
If the society is “more capable than us” in some average sense, where we still have certain advantages over them, then I agree that we could still contribute things.
If the society is “more capable (and good) than us” in all the important ways, then they’d also be better at making themselves smarter/wise than we would have been, and better at handling the transition, so further pauses really wouldn’t have contributed much.
Idk, I don’t know particularly want to argue about definitions here. I just think there’s a decent chance that I’ll look back after the singularity and be like “yep, the sloppy transition sure meant that we took on a bunch of ex-ante risk, but since we got lucky, extra pause time wouldn’t have helped vis-a-vis the long-run lock-in issues. Anything they could have done to help is stuff we can do better now.” (And/or: Marginal pause time may have been good or bad via various values or power changes, but it wouldn’t have systematically led to improvements from everyone’s perspective by e.g. enabling additional intellectual work, because it turns out it was fine to defer the relevant intellectual work until later.)
Even this society, if it’s in the future, then part of the transition would have already occurred, so they won’t have the opportunity to make it go better. So by not pausing now we’d permanently give up this opportunity.
Take the issue in this recent comment, of building an initial AGI that reasons well or poorly about domains that lack fast/cheap feedback signals. It seems very plausible that our long-term civilizational trajectory is significantly affected by which type of AGI gets built first. Suppose we end up building one that reasons poorly about such domains, then:
The post-AGI civilization may end up being less capable (and good) than us on average, or in some important ways.
Even if they’re actually more capable (and good) than us in all the important ways, they could have been even better if only we had built an AGI that reasons well in such domains, but they can’t go back in time and change this.
I of course agree, but I’d think this would mostly be an issue of capabilities or goodness of our future society, since there’s not much external to our society that’s getting worse as a result of the transition. Anyway, that seems like maybe one of those definitional issues. I think you’re probably right that there’s some possible changes that aren’t well characterized as being about the capabilities or goodness of our society, so an improvemet in those dimensions aren’t strictly speaking sufficient for a pause to not have been valuable.
I care more about my claim that started with “I just think there’s a decent chance...”. (Which is importantly only asserting a decent chance, not saying that there aren’t plausible ways it could be false.)