I think this is unreasonably hopeful. I think it’s likely that AI companies will develop a superhuman researcher mostly out of RLing for doing research, which I would expect to shape an AI whose main drive is towards doing research. To the extent that it may have longer-horizon drives beyond individual research, I expect those to be built around, and secondary to, a non-negotiable drive to do research now.
(At risk of over-anthropomorphizing AIs, analogize them to e/accs who basically just wanted to make money building cool AI stuff, and invented an insane philosophical edifice entirely subservient to that drive.)
I’m not being hopeful, I think this hypothetical involves a less merciful takeover than otherwise, because the AIs that take over are not superintelligent, and so unable to be as careful about the outcomes as they might want. In any case there’s probably at least permanent disempowerment, but a non-superintelligent takeover makes literal extinction (or a global catastrophe short of extinction) more likely (for AIs with the same values).
I think this is unreasonably hopeful. I think it’s likely that AI companies will develop a superhuman researcher mostly out of RLing for doing research, which I would expect to shape an AI whose main drive is towards doing research. To the extent that it may have longer-horizon drives beyond individual research, I expect those to be built around, and secondary to, a non-negotiable drive to do research now.
(At risk of over-anthropomorphizing AIs, analogize them to e/accs who basically just wanted to make money building cool AI stuff, and invented an insane philosophical edifice entirely subservient to that drive.)
I’m not being hopeful, I think this hypothetical involves a less merciful takeover than otherwise, because the AIs that take over are not superintelligent, and so unable to be as careful about the outcomes as they might want. In any case there’s probably at least permanent disempowerment, but a non-superintelligent takeover makes literal extinction (or a global catastrophe short of extinction) more likely (for AIs with the same values).