Viliam comments on Words That Belong to Someone

Viliam 25 Feb 2026 15:57 UTC
2 points
0
If the value of a human teacher is partly that their words are tied to who they are, that a meditation teacher cannot suddenly optimize solely for self-interest without contradicting everything he’s built, is the absence of that constraint in LLMs a feature or a failure?
The good part of that is that not all constraints are necessary for the thing you care about; many are incidental. Perhaps you want to talk to someone you respect, but there is no language you are both fluent at. The “doesn’t speak your language” constraint is almost always incidental to what you actually care about.
A specific example that often comes to my mind: I like the Montessori education, but it seems limited by the fact that Maria Montessori lived in 1870-1952, so you can’t ask her what she would think about e.g. computers. Her followers seem to copy her teachings rather than copy the thing that generated the teachings, so if you ask them what about computer science, they will mostly try to avoid the question, or to avoid the computers in general. And yet I believe that if Maria Montessori lived in 21st century, she would have an opinion on the role of computers in education (and it would very likely be different from “avoid the computers as much as you can”). Here, the incidental trait is “lived in a different century”.
And the bad part is that it is less predictable; both the incidental and the load-bearing constraints are removed without a warning. A good teacher becomes a crackpot in the middle of a line. Constant betrayal.
what does it mean to have increasingly good conversations with something that will never be transformed by them?
That you are transformed, of course. Perhaps in a way that you are more efficient at using the specific tool for conversations. (Just like when you keep using a hammer, you become more efficient at hammering the nails.)
Finally, if the local coherence of a person is the best assurance I have that someone who doesn’t know me personally would be kind to me — that their words and actions are prescribed and proven by a life they have led — could something like that be applied to ensure LLMs are not going to hurt us?
With people, the “proof by lifetime behavior” is not only about the time, but also that during the time they have probably encountered various different situations. For example, if someone spends 50 years without ever stealing a thing, it’s not just that they can abstain from stealing for a long time, but that during those 50 years they have probably experienced many kinds of situations, and they could abstain from stealing in all of them. That’s what makes their abstinence from stealing credibly robust.
A person who has spent the entire live in one set of circumstances, and then suddenly their circumstances change dramatically, is not trustworthy. In popular wisdom: power corrupts (i.e. people who get into situations of power often start to behave differently than they used to behave in situations without power). Some people play it meta and try to preserve their trustworthiness by avoiding unusual situations (avoiding the temptation).
The analogical evidence that LLMs won’t hurt us would be if they spent a lot of time, in many different situations, without hurting us. (Including the situations where they had the power to hurt us with impunity.)