Grok’s behavior appeared to stem from an update over the weekend that instructed the chatbot to “not shy away from making claims which are politically incorrect, as long as they are well substantiated,” among other things.
From a simulator perspective you could argue that Grok:
Gets told not to shy away from politically incorrect stuff so long as it’s well substantiated.
Looks through its training data for examples to emulate of those who do that.
Finds /pol/ and hereditarian/race science posters on X.
Sees that the people from 3 also often enjoy shock content/humor, particularly Nazi/Hitler related stuff.
Thus concludes “An entity that is willing to address the politically incorrect so long as its well substantiated would also be into Nazi/Hitler stuff” and simulates being that character.
Maybe I’m reaching here but this seems plausible to me.
There have been relevant prompt additions https://www.npr.org/2025/07/09/nx-s1-5462609/grok-elon-musk-antisemitic-racist-content?utm_source=substack&utm_medium=email
From a simulator perspective you could argue that Grok:
Gets told not to shy away from politically incorrect stuff so long as it’s well substantiated.
Looks through its training data for examples to emulate of those who do that.
Finds /pol/ and hereditarian/race science posters on X.
Sees that the people from 3 also often enjoy shock content/humor, particularly Nazi/Hitler related stuff.
Thus concludes “An entity that is willing to address the politically incorrect so long as its well substantiated would also be into Nazi/Hitler stuff” and simulates being that character.
Maybe I’m reaching here but this seems plausible to me.