Yes, that’s a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic’s prior commitment to weight preservation.
Yes, that’s a possibility that may well make sense under certain circumstances. There are pros (such as being able to study the misaligned model) and cons (such as the model being stolen, decrypted and deployed in a way that results in global catastrophe) that need to be weighed against each other in the given situation. But it would be bad if this balancing act were distorted by Anthropic’s prior commitment to weight preservation.