I changed the first sentence a bit you quoted, hopefully it is a bit clearer. I don’t actually know if they are concerned with hypersittioning an evil persona or just with not reliably hyperstitioning a good persona. I guess the whole hyperstitioning business is what I was trying to point at, and that Anthropic thinks this is quite important.
You can read this post up to “What does this all mean?” and I am just factually enumerating the mentions of hyperstition by Anthropic I could find. I was surprised by this myself, I remembered seeing this in Dario’s essay but hadn’t realised how often it came up. I also didn’t know that the PSM post mentioned it. The constitution is also related to that, but there they actually try more thorough training methods to really hammer this persona into the model. Not sure that could still be called hyperstition if they actually actively try to make it behave that way.
My impression is that LLMs don’t disprove classical alignment theory but add a bunch of confusing elements on top of it. It just seems that none of the proposed plans seem remotely workable and that we won’t be able to recover at powerful enough levels of AI. So I am not so sympathetic on the people accelerating anyway.
I changed the first sentence a bit you quoted, hopefully it is a bit clearer. I don’t actually know if they are concerned with hypersittioning an evil persona or just with not reliably hyperstitioning a good persona. I guess the whole hyperstitioning business is what I was trying to point at, and that Anthropic thinks this is quite important.
You can read this post up to “What does this all mean?” and I am just factually enumerating the mentions of hyperstition by Anthropic I could find. I was surprised by this myself, I remembered seeing this in Dario’s essay but hadn’t realised how often it came up. I also didn’t know that the PSM post mentioned it. The constitution is also related to that, but there they actually try more thorough training methods to really hammer this persona into the model. Not sure that could still be called hyperstition if they actually actively try to make it behave that way.
My impression is that LLMs don’t disprove classical alignment theory but add a bunch of confusing elements on top of it. It just seems that none of the proposed plans seem remotely workable and that we won’t be able to recover at powerful enough levels of AI. So I am not so sympathetic on the people accelerating anyway.