Chris Olah is/was the main guy working on interpretability research, and he is a co-founder of Anthropic. So Anthropic would definitely be aware of this idea.
I’ve not seen the idea of bit flip idea before, and anthropic are quasi-alone on that, they might have missed it
Chris Olah is/was the main guy working on interpretability research, and he is a co-founder of Anthropic. So Anthropic would definitely be aware of this idea.
I’ve not seen the idea of bit flip idea before, and anthropic are quasi-alone on that, they might have missed it