Nate Showell answers Share AI Safety Ideas: Both Crazy and Not

Nate Showell 2 Mar 2025 20:13 UTC
14 points
1
The phenomenon of LLMs converging on mystical-sounding outputs deserves more exploration. There might be something alignment-relevant happening to LLMs’ self-models/world-models when they enter the mystical mode, potentially related to self-other overlap or to a similar ontology in which the concepts of “self” and “other” aren’t used. I would like to see an interpretability project analyzing the properties of LLMs that are in the mystical mode.