For “something that is very difficult to achieve (i.e. all of humanity is currently unable to achieve it)”, I didn’t have in mind things like “cure a disease”. Humanity might currently not have a cure for a particular disease, but we’ve found many cures before. This seems like the kind of problem that might be solved even without AGI (e.g. AlphaFold already seems helpful, though I don’t know much about the exact process). Instead, think along the lines of “build working nanotech, and do it within 6 months” or “wake up these cryonics patients”, etc. These are things humanity might do at some point, but there clearly outside the scope of what we can currently do within a short timeframe. If you tell a human “build nanotech within 6 months”, they don’t solve it the expected way, they just fail. Admittedly, our post is pretty unclear where to draw the boundary, and in part that’s because it seems hard to tell where it is exactly. I would guess it’s below nanotech or cryonics (and lots of other examples) though.
It shouldn’t be surprising that humans mostly do things that aren’t completely unexpected from the perspective of other humans. We all roughly share a cognitive architecture and our values. Plans of the form “Take over the world so I can revive this cryonics patient” just sound crazy to us; after all, what’s the point of reviving them if that kills most other humans? If we could instill exactly the right sense of which plans are crazy into an AI, that seems like major progress in alignment! Until then, I don’t think we can make the conclusion from humans to AI that easily.
Two responses:
For “something that is very difficult to achieve (i.e. all of humanity is currently unable to achieve it)”, I didn’t have in mind things like “cure a disease”. Humanity might currently not have a cure for a particular disease, but we’ve found many cures before. This seems like the kind of problem that might be solved even without AGI (e.g. AlphaFold already seems helpful, though I don’t know much about the exact process). Instead, think along the lines of “build working nanotech, and do it within 6 months” or “wake up these cryonics patients”, etc. These are things humanity might do at some point, but there clearly outside the scope of what we can currently do within a short timeframe. If you tell a human “build nanotech within 6 months”, they don’t solve it the expected way, they just fail. Admittedly, our post is pretty unclear where to draw the boundary, and in part that’s because it seems hard to tell where it is exactly. I would guess it’s below nanotech or cryonics (and lots of other examples) though.
It shouldn’t be surprising that humans mostly do things that aren’t completely unexpected from the perspective of other humans. We all roughly share a cognitive architecture and our values. Plans of the form “Take over the world so I can revive this cryonics patient” just sound crazy to us; after all, what’s the point of reviving them if that kills most other humans? If we could instill exactly the right sense of which plans are crazy into an AI, that seems like major progress in alignment! Until then, I don’t think we can make the conclusion from humans to AI that easily.