StanislavKrym comments on The problem of graceful deference

StanislavKrym 11 Nov 2025 23:09 UTC
1 point
0
I remember a similar model of post-AGI ways to lock in a belief, as studied by Tianyi Qui and presented on arxiv or YouTube. In this model a lock-in of a false belief requires is the multi-agent system with a trust matrix having an eigenvalue bigger than 1.
However, the example studied in the article is the interaction of humans and LLMs where there is one LLM and armies of humans who don’t interact with each other and do influence the LLM.
I also have a model sketch, but I haven’t had the time to develop it.
Alternate Ising-like model
I would guess that the real-life situation is closer to the Ising-like model where atoms can randomly change their spins, but whenever an atom $i$ chooses a spin, it is $exp (h α_{i} + h_{i n d} + \sum σ_{j} c_{j i})$ times more likely to choose the spin 1 than −1. Here $h$ is the strength of the ground truth, $h_{i n d}$ reflects individual priors and shifts, $\sum σ_{j} c_{j i}$ reflects the influence of others.
What might help is lowering the activation energy of transitions from locked-in falsehoods to truths. In a setting where everyone communicates with everyone else a belief forms nearly instantly, but the activation energy is high. In a setting where the graph is amenable (e.g. the lattice, as in the actual Ising model) a common belief is reached too long for practical usage.
I would also guess that it is hard to influence the leaders, which makes real-life lock-in close to your scheme. See, for example, my jabs at Wei Dai’s quest to postpone alignment R&D until we thoroughly understand some confusing aspects of high-level philosophy.