Ty. Is this a summary of a more-concrete reason you have for hope? (Have you got alternative more-concrete summaries you’d prefer?)
“Maybe huge amounts of human-directed weak intelligent labor will be used to unlock a new AI paradigm that produces more comprehensible AIs that humans can actually understand, which would be a different and more-hopeful situation.”
(Separately: I acknowledge that if there’s one story for how the playing field might change for the better, then there might be bunch more stories too, which would make “things are gonna change” an argument that supports the claim that the future will have a much better chance than we’d have if ChatGPT-6 was all it took.)
It seems pretty likely to be doable (with lots of human-directed weak AI labor and/or controlled stronger AI labor) to use iterative and prosaic methods within roughly the current paradigm to sufficiently align AIs which are slightly superhuman. In particular, AIs which are capable enough to be better than humans at safety work (while being much faster and having other AI advantages), but not much more capable than this. This also requires doing a good job elicting capabilites and making the epistemics of these AIs reasonably good.
Doable doesn’t mean easy or going to happen by default.
If we succeeded in aligning these AIs and handing off to them, they would be in a decent position for other ongoing solving alignment (e.g. aligning a somewhat smarter successor which itself aligns its successor and so on or scalably solving alignment) and also in a decent position to buy more time for solving alignment.
I don’t think this is all of my hope, but if I felt much less optimistic about these pieces, that would substantially change my perspective.
Ty. Is this a summary of a more-concrete reason you have for hope? (Have you got alternative more-concrete summaries you’d prefer?)
“Maybe huge amounts of human-directed weak intelligent labor will be used to unlock a new AI paradigm that produces more comprehensible AIs that humans can actually understand, which would be a different and more-hopeful situation.”
(Separately: I acknowledge that if there’s one story for how the playing field might change for the better, then there might be bunch more stories too, which would make “things are gonna change” an argument that supports the claim that the future will have a much better chance than we’d have if ChatGPT-6 was all it took.)
I would say my summary for hope is more like:
It seems pretty likely to be doable (with lots of human-directed weak AI labor and/or controlled stronger AI labor) to use iterative and prosaic methods within roughly the current paradigm to sufficiently align AIs which are slightly superhuman. In particular, AIs which are capable enough to be better than humans at safety work (while being much faster and having other AI advantages), but not much more capable than this. This also requires doing a good job elicting capabilites and making the epistemics of these AIs reasonably good.
Doable doesn’t mean easy or going to happen by default.
If we succeeded in aligning these AIs and handing off to them, they would be in a decent position for other ongoing solving alignment (e.g. aligning a somewhat smarter successor which itself aligns its successor and so on or scalably solving alignment) and also in a decent position to buy more time for solving alignment.
I don’t think this is all of my hope, but if I felt much less optimistic about these pieces, that would substantially change my perspective.