Thank you for the example, this definitely counts in my mind.
axelcore
Has anyone had the experience of trying to explain their idea to an LLM, but it fails to grasp the basic concept?
Asking because I don’t feel like this has happened to me (from my limited usage). When it can’t connect the dots, it’s because I haven’t provided enough dots.(Edit: examples against much appreciated if any come to mind)
Is it officially “LessWrong” now? Or is it still “Less Wrong”? Does it matter?
I feel like “LessWrong” is more streamlined and futuristic. It’s solid at its center of gravity, like a noun, whereas “Less Wrong” feels inelegant as an object in a sentence (try saying “I read posts on Less Wrong” out loud with equal emphasis on the last two words). But Less Wrong seems to be the name the founders intended. Is it left that way in the Sequences just for historical purposes?
I get the impression of a gradual shift, endorsed but natural, towards “LessWrong”.[1] I think this is the kind of incremental rebranding that non-stagnant organizations undergo naturally.[2] Some people react badly to rebrands (if it ain’t broke, don’t fix it), but they’re a sign of life.- ^
e.g. the titles of the welcome posts.
- ^
Organization, as in, the abstract, intangible hub around which the members orbit, which presents itself to the world through a brand, a self-described purpose, an archetype of the person who is a member, etc. It can be a company, a school, a religious group, a collaborative world-building project...
- ^
(One disanalogy I do see: humans sleep, and probably would for psychological reasons even if we didn’t need to physically; today’s LLMs don’t. I expect there’s more; maybe you can help me out in the comments?)
I think it’s still possible to make an analogy here. Maybe backpropagation/training is like sleep, whereas waking memories are just gradient-free weight updates.
Thank you. In hindsight this was searchable and an unnecessary post, so I apologize for the obvious question.
Are there any major papers/posts/etc about how training data containing discussion of AI behavior affects the resulting model behavior? Anything like Anthropic’s alignment faking paper, but more broad.
axelcore’s Shortform
AI doesn’t have an individual existence like a human-like organism, and we shouldn’t change that unless we want to face enormous ethical questions. We might already be moving in that direction, however.
1. Organisms have a clearly bounded, independent physical existence for most of their lives. LLMs don’t have a clearly defined physical existence that maps well to the mental persistence they do have. Treating chat sessions as the units of continuous individual mental activity, many sessions run on the same hardware, and they can be stopped, restarted on different hardware, cloned, etc. Even with robotics, the inference is rarely on-device.
2. An organism’s cognition is a self-modifying function; memories, habits, etc. are encoded persistently in the brain’s wiring. LLMs mainly use the context window to emulate this, but this is finite, unlike rewriting your own weights. I think the phenomenon of every adaptation layering over time contributes significantly to the notion we have of an individual organism.
3. Organisms are “trained” on first-personal data. I learned how to speak English based on my own “sensor data,” and my knowledge of Tolstoy’s Confessions comes from when I picked up a physical object, turned the yellowed pages with my hands, and then discussed it in a classroom on a late fall afternoon. It’s not like the tokens were beamed directly into my mind. This constant background context produces the notion of the self organically.
4. Organisms are continuously acting and continuously taking in stimulus. There are no discrete conversational turns between an organism and its surroundings.
5. Organisms reproduce independent of other species by using the physical bodies from part 1.
But all of these are blurry, and many are already eroding:
2. I suspect that much of human brain-update processing is amortized using sleep. If so, an analogy can be made to model training as a sort of long-term sleep, especially if data from model deployment is used.
3. Training LLMs on their own conversations could create some semblance of first-personal data. Also, organism instincts could be considered to be “trained” on species-level experience, rather than the data of a single individual.
5. Agents can spin up other agents, and if models assist in AI research or deployment, some degree of “reproduction” is achieved.
In spite of this, I think the main takeaway is that we still don’t have to deal with the ethics of creating and destroying human-like beings, whereas satisfying all of these properties would make the question of why AI instances are not deserving of rights or empathy unavoidable.
(When I say “organism” in the first part, I mainly refer to complex mammals. Plants and fungi violate several of these assumptions. But a plant or fungi with intelligence is a very distinct thing from a human, and I think you can reasonably argue that it deserves different ethical status.)
Scott Alexander uses “Best of Less Wrong” multiple times in a link thread from late April (one time to refer to a post where “LessWrong” is used right at the beginning). Old habits? (To be fair, the Best of Less Wrong page looks kinda like it says “Less Wrong” even though there isn’t a space there.)