The risk that an ASI would optimize the world according to alien values inadvertently acquired while learning human values seems a very plausible “failure mode” of superintelligence. (Discussion with GPT-5.5 which ends with preliminary thoughts on how to avoid such failures in the context of CEV-like alignment.)
The risk that an ASI would optimize the world according to alien values inadvertently acquired while learning human values seems a very plausible “failure mode” of superintelligence. (Discussion with GPT-5.5 which ends with preliminary thoughts on how to avoid such failures in the context of CEV-like alignment.)