I agree that “I’m thinking about how to kill you” is not itself a highly concerning phrase. However, I think it’s plausible that an advanced LLM-like AI could hypnotise itself into taking harmful actions.
I agree that “I’m thinking about how to kill you” is not itself a highly concerning phrase. However, I think it’s plausible that an advanced LLM-like AI could hypnotise itself into taking harmful actions.