This makes sense, but seems to rely on the human spending most of their time tackling well-defined but non-trivial problems where an AI doesn’t need to be re-directed frequently [EDIT: the preceding was poorly worded—I meant that if prior to the availability of AI assistants this were true, it’d allow a lot of speedup as the AIs take over this work; otherwise it’s less clearly so helpful]. Perhaps this is true for ARC—that’s encouraging (though it does again make me wonder why they don’t employ more mathematicians—surely not all the problems are serial on a single critical path?). I’d guess it’s less often true for MIRI and John.
Of course once there’s a large speedup of certain methods, the most efficient methodology would look different. I agree that 5x to 10x doesn’t seem implausible.
″...in the future we’ll have access to the exact AIs we’re worried about.”:
We’ll have access to the ones we’re worried about deploying.
We won’t have access to the ones we’re worried about training until we’re training them.
I do buy that this makes safety work for that level of AI more straightforward—assuming we’re not already dead. I expect most of the value is in what it tells us about a more general solution, if anything—similarly for model organisms. I suppose it does seem plausible that this is the first level we see a qualitatively different kind of general reasoning/reflection that leads us in new theoretical directions. (though I note that this makes [this is useful to study] correlate strongly with [this is dangerous to train])
“Researching how to make trustworthy human level AIs seems much more tractable than researching how to align wildly superhuman systems”:
This isn’t clear to me. I’d guess that the same fundamental understanding is required for both. “trustworthy” seems superficially easier than “aligned”, but that’s not obvious in a general context. I’d expect that implementing the trustworthy human-level version would be a lower bar—butthat the same understanding would show us what conditions would need to obtain in either case. (certainly I’m all for people looking for an easier path to the human-level version, if this can be done safely—I’d just be somewhat surprised if we find one)
“So coordination to do better than this would be great”.
I’d be curious to know what you’d want to aim for here—both in a mostly ideal world, and what seems most expedient.
“So coordination to do better than this would be great”.
I’d be curious to know what you’d want to aim for here—both in a mostly ideal world, and what seems most expedient.
As far as the ideal, I happened to write something about in another comment yesterday. Excerpt:
Best: we first prevent hardware progress and stop H100 manufactoring for a bit, then we prevent AI algorithmic progress, and then we stop scaling (ideally in that order). Then, we heavily invest in long run safety research agendas and hold the pause for a long time (20 years sounds good to start). This requires heavy international coordination.
As far as expedient, something like:
Demand labs have good RSPs (or something similar) using inside and outside game, try to get labs to fill in tricky future details of these RSPs as early as possible without depending on “magic” (speculative future science which hasn’t yet been verified). Have AI takeover motivated people work on the underlying tech and implementation.
Work on policy and aim for powerful US policy interventions in parallel. Other countries could also be relevant.
Both of these are unlikely to perfectly succeed, but seems like good directions to push on.
I think pushing for AI lab scaling pauses is probably net negative right now, but I don’t feel very strongly either way (it mostly just feels not that leveraged overall). I think slowing down hardware progress seems clearly good if we could do it at low cost, but seems super intractible.
Thanks, this seems very reasonable. I’d missed your other comment. (Oh and I edited my previous comment for clarity: I guess you were disagreeing with my clumsily misleading wording, rather than what I meant(??))
(Oh and I edited my previous comment for clarity: I guess you were disagreeing with my clumsily misleading wording, rather than what I meant(??))
Corresponding comment text:
This makes sense, but seems to rely on the human spending most of their time tackling well-defined but non-trivial problems where an AI doesn’t need to be re-directed frequently [EDIT: the preceding was poorly worded—I meant that if prior to the availability of AI assistants this were true, it’d allow a lot of speedup as the AIs take over this work; otherwise it’s less clearly so helpful].
I think I disagree with what you meant, but not that strongly. It’s not that important, so I don’t really want to get into it. Basically, I don’t think that “well-defined” is that important (not obviously required for some ability to judge the finished work) and I don’t think “re-direction frequency” is the right way to think about.
This is clarifying, thanks.
A few thoughts:
“Serial speed is key”:
This makes sense, but seems to rely on the human spending most of their time tackling well-defined but non-trivial problems where an AI doesn’t need to be re-directed frequently [EDIT: the preceding was poorly worded—I meant that if prior to the availability of AI assistants this were true, it’d allow a lot of speedup as the AIs take over this work; otherwise it’s less clearly so helpful].
Perhaps this is true for ARC—that’s encouraging (though it does again make me wonder why they don’t employ more mathematicians—surely not all the problems are serial on a single critical path?).
I’d guess it’s less often true for MIRI and John.
Of course once there’s a large speedup of certain methods, the most efficient methodology would look different. I agree that 5x to 10x doesn’t seem implausible.
″...in the future we’ll have access to the exact AIs we’re worried about.”:
We’ll have access to the ones we’re worried about deploying.
We won’t have access to the ones we’re worried about training until we’re training them.
I do buy that this makes safety work for that level of AI more straightforward—assuming we’re not already dead. I expect most of the value is in what it tells us about a more general solution, if anything—similarly for model organisms. I suppose it does seem plausible that this is the first level we see a qualitatively different kind of general reasoning/reflection that leads us in new theoretical directions. (though I note that this makes [this is useful to study] correlate strongly with [this is dangerous to train])
“Researching how to make trustworthy human level AIs seems much more tractable than researching how to align wildly superhuman systems”:
This isn’t clear to me. I’d guess that the same fundamental understanding is required for both. “trustworthy” seems superficially easier than “aligned”, but that’s not obvious in a general context.
I’d expect that implementing the trustworthy human-level version would be a lower bar—but that the same understanding would show us what conditions would need to obtain in either case. (certainly I’m all for people looking for an easier path to the human-level version, if this can be done safely—I’d just be somewhat surprised if we find one)
“So coordination to do better than this would be great”.
I’d be curious to know what you’d want to aim for here—both in a mostly ideal world, and what seems most expedient.
As far as the ideal, I happened to write something about in another comment yesterday. Excerpt:
As far as expedient, something like:
Demand labs have good RSPs (or something similar) using inside and outside game, try to get labs to fill in tricky future details of these RSPs as early as possible without depending on “magic” (speculative future science which hasn’t yet been verified). Have AI takeover motivated people work on the underlying tech and implementation.
Work on policy and aim for powerful US policy interventions in parallel. Other countries could also be relevant.
Both of these are unlikely to perfectly succeed, but seems like good directions to push on.
I think pushing for AI lab scaling pauses is probably net negative right now, but I don’t feel very strongly either way (it mostly just feels not that leveraged overall). I think slowing down hardware progress seems clearly good if we could do it at low cost, but seems super intractible.
Thanks, this seems very reasonable. I’d missed your other comment.
(Oh and I edited my previous comment for clarity: I guess you were disagreeing with my clumsily misleading wording, rather than what I meant(??))
Corresponding comment text:
I think I disagree with what you meant, but not that strongly. It’s not that important, so I don’t really want to get into it. Basically, I don’t think that “well-defined” is that important (not obviously required for some ability to judge the finished work) and I don’t think “re-direction frequency” is the right way to think about.