Suppose that a multi-decade pause is somehow necessary. How could a counterfactual Anthropic ruled by you find it out? How likely would GDM and OpenAI be to the find out the need for a multi-decade pause?
Edited to add: Buck’s phrase “The basic case against Anthropic is that it is probably the worst epistemic environment for discussion of misalignment risk out of these companies, because the organization cares a lot about convincing low-info Ant employees that Ant is great on safety, so they spend more effort on shaping the internal narrative about misalignment risk” seems to be weird.Does it mean that instead of the actual security team Anthropic somehow implemented a security theater useless against actual misalignment which will emerge when the time comes?
Suppose that a multi-decade pause is somehow necessary. How could a counterfactual Anthropic ruled by you find it out? How likely would GDM and OpenAI be to the find out the need for a multi-decade pause?
Edited to add: Buck’s phrase “The basic case against Anthropic is that it is probably the worst epistemic environment for discussion of misalignment risk out of these companies, because the organization cares a lot about convincing low-info Ant employees that Ant is great on safety, so they spend more effort on shaping the internal narrative about misalignment risk” seems to be weird. Does it mean that instead of the actual security team Anthropic somehow implemented a security theater useless against actual misalignment which will emerge when the time comes?