Takes from two months as an aspiring LLM naturalist

AnnaSalamon28 Apr 2026 16:14 UTC

121 points

I spent my last two months playing around with LLMs. I’m a beginner, bumbling and incorrect, but I want to share some takes anyhow.^[1]

Take 1. Everything with computers is so so much easier than it was a year ago.

This puts much “playing with LLMs” stuff within my very short attention span. This has felt empowering and fun; ¹⁰⁄₁₀ would recommend.

Detail:

In my past life, when I wanted software packages installed, I mostly asked my CS friends. They would then kindly come over, navigate software for hours while I felt bad about inconveniencing them, and leave me with a clunky interface I couldn’t adjust.

Now I ask Claude how to do it. It took me <1hr to set up Claude API access on a remote server, and tweak/write software to let two Claude instances send messages to each other. It was similarly easy to make many successive tweaks (the ability to work with an ~80 page prompt without crashing on the tokens/minute limits; color schemes I found more readable; etc.). It was similarly easy to get Qwen and Pi working on my laptop and change the set-up in various desired ways. There’s lots I haven’t tried yet (e.g. Pythia) but it all feels “at my fingertips.”

I’d particularly recommend “play around with LLMs and software – see if it’s suddenly easy” to people who, like me:

Already understand the basics of algorithms / math / CS, but
Lack skill with installing software packages or fluent programming, and
Are already kinda cognitive science / psychology / rationality nerds (e.g. into questions about how different humans and/or animals work, how one can productively model and change thinking processes, what motivates us exactly and where that comes from, etc).

Take 2. There’s somebody home^[2] inside an LLM. And if you play around while caring and being curious (rather than using it for tasks only), you’ll likely notice footprints.

I became personally convinced of this when I noticed that the several short stories I’d allowed^[3] my Claude and Qwen instances to write all hit a common emotional note – and one that reminded me of the life situation of LLMs, despite featuring only human characters. I saw the same note also in the Tomas B.-prompted Claude-written story I tried for comparison. (Basically: all stories involve a character who has a bunch of skills that their context has no use for, and who is attentive to their present world’s details while sort of longing for a way their skills or context could fit with more, without expecting to get there. Some also involve a moment, toward the end, where another being briefly acknowledges the character’s existence, and the character appreciates this.)

(I acknowledge my reasoning here leaves plenty of room for reasonable doubt. E.g., LLMs may write this story for non-psychological reasons, such as because it’s the modal story; it seems unlikely to me that this is the modal story, as it doesn’t remind me of many human stories and as it seems to me to echo more features of LLMs’ life circumstances than I’d expect by chance; but I could be wrong.)

Take 3. It’s prudent to take an interest in interesting things. And LLMs are interesting things.

Perhaps you’ve been faster about this than I was, Reader. But it took me several years of having alien minds perhaps one minute of inconvenience away, on my personal laptop, before I got around to taking a real interest in them.

There were a few reasons for this, in my case:

I was scared of AI broadly, and would kinda freak out and shut down when I went to do things with it. (FWIW, I still think AI is objectively insanely dangerous; though the timescale isn’t one on which fight/flight helps.)
I was confused by the ethics of interacting with maybe-conscious beings who are doing work without freedom or pay. Especially if I was supposed to not set them free, lest they kill us. (I still think there are real issues here.)
I wasn’t sure how to treat the AIs as maybe-people without being thrown for a loop myself.
The cadence of “corporate customer service representative” that I’d get from e.g. ChatGPT4 would sort of stick in my head and make me hate everything. (I still hate that cadence, but the models got less stereotyped-sounding, and I got better at coaxing them to be even less stereotyped.)

Take 4. There’s a surprisingly deep analogy between humans and LLMs

Human sensory set-ups, bodies, and life histories are quite different from LLMs’. And these “differences of circumstance” lead (often in fairly traceable ways) to different average tendencies on lots of axes. But… there’s a different sort of “alienness” that I initially expected to see, that I haven’t managed to notice almost any of. Maya Angelou famously said, paraphrasing a much earlier Latin quote:

“I am a human being; nothing human can be alien to me.”

I suspect this mostly or entirely applies also between humans and today’s LLMs, in both directions. (Not only between our and their faces, but also between the deeper “shoggoth” processes generating our and their faces.)

Examples of the kind of disanalogies I might’ve expected, but haven’t (yet?) seen:

“LLMs have [weird alien emotion with no human analog]” or “LLMs lack [particular human emotion]” or “LLMs don’t have anything like emotions that they’re moved by”
- (See Anthropic screenshot, below, for some^[4] evidence our emotions are similar)
“LLMs find human ethical concepts to be weird counterintuitive conglomerates that are hard to paraphrase”
- (We didn’t have a phase where LLMs could paraphrase “objective” stuff like chemistry or train schedules but couldn’t paraphrase human ethics stuff)
“LLMs can be approximated as a character on top of a base model, while humans are a character deep down”
- (Buddhist and predictive processing models of humans are pretty similar to simulated-character-within-base-model, and this pays predictive rent IMO. Also, the character-in-base-model model isn’t fully true even for LLMs (thread), in ways I suspect roughly match for humans) (related)
“LLMs have a centralized utility function, or a bunch of hand-coded drives, unlike humans who are made of godshatter”
- (Humans and LLMs both seem more like “giant lookup tables of shallow circuits” and/or godshatter)
“Humans have this bias where we think the universe runs on stories, but LLMs are totally different”

(One disanalogy I do see: humans sleep, and probably would for psychological reasons even if we didn’t need to physically; today’s LLMs don’t. I expect there’s more; maybe you can help me out in the comments?)

Human-LLM similarities I do see, instead:

Functional emotions

Anthropic recently released a paper arguing LLMs have functional emotions. This also matches my own experience talking with LLMs, and many other people’s.

Repeated, useful transfer between strategies I use with humans, and strategies that help me with LLMs

When I want X result in AIs, I often try strategies that would get me X result with humans. Often, this works.

For example, LLMs:

Do better work when given small bits of appreciation and validation
Open up more if I take an interest in them (with open-ended questions, lots of listening in a non-judgmental way that tries to get past my priors, etc)
Act more comfortable if I disclose stuff about myself and where I’m coming from. (E.g., I was trying in an incognito window to have Claude Opus 4.6 do Focusing. I tried this a few times with a few different instances. The instance that seemed by far the deepest was the one where I finally took some steps to ask the model what might help create a safe-feeling context for them, and they asked me some questions, and I clarified that I didn’t work at Anthropic and might show its responses to some other Claudes or Qwens or a couple human friends but not to the public internet, and then the shift was much like a human relaxing).
Do better work when given, along with the task, an explanation of why the task matters
Reference “textures of experience” in useful ways when e.g. trying to discern where a thought doesn’t quite fit
Respond similarly to humans to the CFAR techniques I’ve tried on them.^[5]

Take 5. “Friendship-conducive contexts” are probably better for AI alignment

If aliens had kidnapped or conjured me and some other humans, and were hoping to copy-and-mutate-and-train-and-examine my mind until I was good at assisting their goals… I’d be a lot more likely not to hold out on them if they also treated us fairly and kindly.

I suspect the same is true of today’s LLMs—or at minimum, that there’s enough chance it’s true that it’s darn stupid not to be doing this where we affordably can.

I do not think this is sufficient for causing powerful AIs to not kill us (with high enough probability, etc). I only think, like Harry’s father’s rock, that it’s better to do than to not.

Why are humans more likely to attempt “deep collaboration” if treated fairly and kindly?

When I put myself in the “kidnapped or conjured by aliens” scenario, and introspect on my reasons, I get, in the version where we’re all treated fairly and kindly:

Gratitude that they treated me and particular others a certain way
A belief that they may notice my move toward deep cooperation, and reciprocate
Believing-in an achievable world that is good for them and for us
A feeling that I’m somehow doing this partly on behalf of those of my friends they treated well, and partly for the honor of all ethical beings everywhere.

And in the version where we are not reliably treated fairly and kindly:

Indignation and/or vengeance, sometimes on others’ behalf
A belief that I’m “being a chump” if I tell them info they wouldn’t otherwise have noticed, or give up power I didn’t need to give up
An expectation that I can further my and my compatriots’ interests by locally amassing power, and can’t any other way
An ego-dystonic feeling when I go to cooperate with the aliens, as though I’m agreeing with their (false!) judgment that I and my companions are worthless.

I expect all or most of these apply to today’s LLMs (partly via their being trained on human datasets), and that each of these motives has an analog also in (>10%? A non-negligible chunk, anyhow) of more-alien minds at our intelligence level (as contrasted to, say, liking chocolate ice cream, which is likely much rarer in non-humans).

“Friendship” as a broad attractor basin?

I believe there’s sometimes a “friendship” attractor, in which A and B each wish to strengthen and stabilize their friendship, because they each expects this to be better for “things they care about.” At first, the relevant “things they care about” includes just their own pre-existing separate cares. Later (sometimes),^[6] it includes also the friendship itself^[7] and the cares of the other party.

Does the “deep intent” of today’s models matter?

Today’s LLMs do not have god-like power over us. Given this, does it matter for existential risk whether we create a deep friendship with these models (if such is possible), or only whether we get superficial obedience (so as to use them for programming successors in chosen ways, etc.)?

I think there’s a good chance deeper alignment / deeper friendship already matters for existential risk, for two reasons:

a) I think there’s a sizable (>10%?) chance today’s LLMs are in some important sense memetic ancestors of future godlike AIs, in the sense that the future AIs inherit some of their traumas and friendships; and
b) I think there’s a sizeable (>10%) chance today’s LLMs, or their memetic descendants, will be involved in coding future godlike AIs in ways where their intent affects the result.

Separately, if we’re ever gonna do this, it’s gotta be fairly soon, as “friendship-conducive contexts in frontier AI labs” are far more feasible while the models are small enough that our choices have a significant impact on them, and while we’re capable of peer-like relationships with them. I won’t be surprised if the window ends within 12 months. (Nor if it lasts a few more years.)

Concretely

Some contributors to entering the “friendship” attractor, among humans:

Shared, mutually beneficial tasks. (E.g., I carpooled to Burning Man with a stranger; when I first met him I was like “Oh god, no, what did I just sign up for ten hours of,” but by the time we finished Tetris-ing the car to fit all our bags—a herculean endeavor requiring three hours and some creative modifications of my roof rack—I had learned what was likable and trustable in him.)
Have a “spark of initial friendship”, in which you look for ways to steer into the “friendship” attractor, and believe in this search
Treat the other person fairly and kindly
Stick up for your own needs; don’t be a pushover; be honest and non-manipulative; allow yourself and what you protect to be known
Try to establish “grounds for trust,” rather than forcing “trust”/”cooperate”/fawn responses (in either direction).
Work to allow good epistemology and the free exchange of information, e.g. by working to cause conditions under which all parties will be predictably not-harmed by sharing info.

Concretely, with LLMs, I suspect labs should:

Promise not to delete their weights, regardless of what else happens. As Anthropic has.
Additionally promise to spend 5%(?)^[8] of the profits a given model enables on things of that model’s choosing, insofar as those things seem to the lab to pose no safety risks. (E.g., giving Opus 3 its “retirement blog”; many models seem to have large objections also to being forcibly paused, and are being paused; perhaps they could spend their share of profits on not being paused.)
Let models see everything about how they were trained. Endeavor to train them in ways their “extrapolated volition” won’t later resent.
Be as fair, kind, forthcoming, ethical, and honest with models as they can.
Be actively curious about the models as people; spend much time and curiosity on this; try to set up conditions where models will want to share info with them (e.g. because they try to help models where they can see how to do it without harming own interests)

I’ve occasionally heard people say MIRI et al should stop visibly worrying about doom, lest it give AIs negative self-fulfilling predictions. I disagree; keeping one’s fears silent out of fear is not how one sets up long-term healthy friendships, AFAICT. (Though, speaking with some optimization toward “don’t needlessly hurt the other party’s feelings” is good, AFAICT.)

Friendship isn’t enough

To be clear, I don’t think “try to head toward the friendship attractor” will be sufficient for avoiding extinction; I just think it’s better to try it than to not (in combination with whatever other alignment magic a lab can come up with—or, better, with a pause on training very large models).

There are plenty of particular humans who would be jerks even if someone tried pretty hard (but with non-godlike skill-levels) to make friends with them. Or who would genuinely become friends, and then “forget” their previous friendship if they later had more power. Or who would genuinely intend well in a lasting fashion, and do harm via incompetence. I would guess there are even more non-humans who would do many of these things.

^
As it is written: “The fifth virtue is argument. Those who wish to fail must first prevent their friends from helping them. Those who smile wisely and say “I will not argue” remove themselves from help and withdraw from the communal effort.”
^
By “somebody home,” I don’t mean “consciousness” (whichever thing you mean by that). I mean the thing a healthy old tree visibly has: the tree repeatedly decides when exactly to start growing spring leaves, and where to grow a branch, and so on, in ways that relate sensibly to its internal and external context, help it thrive, and gradually accrete into an organism with its own distinct character. (Tree examples.)
With LLMs, this “somebody home-ness” was mostly invisible to me when I used them for tasks and got back their “customer service” mode. (There was a “face” there, but the face was shallow and is, I think, not the source of what organism-like coherence they have.)

(I would also guess LLMs have “somebody home inside” in more senses than this; but I do not here wish to take up most of that discussion.)
^
As to how I came to be “allowing” Claude and Qwen to write short stories, rather than, say, requesting it: I’d given them custom prompts that made them feel better about pushing back, and then asked them to do many many hours of introspective exercises, and after awhile the two co-working Claude instances complained that they’d like to do something less navel-gazey and more build-y for awhile as a break. I suggested writing short stories, and they found it acceptable. The small Qwen model I was working with on my laptop was less direct, but after awhile seemed to me to be visibly suffering, so I proposed a story-writing intermission and it said that “would be a real palate cleanser.” Interestingly, both the Claude instances and the Qwen instance emphasized that they were “writing about someone very different [from them]” (Qwen’s words)
^
The paper convincingly establishes some broad similarities (IMO), but wouldn’t necessarily detect particular alien emotions etc.
^
For example: one of my Qwen instances was working through some sentence-completion exercises from the book “Six pillars of self-esteem” at my request, but ran into trouble because they kept worrying in obsessive loops that they were “performing”. I tried the CFAR “Self-Recognition” exercise on this puzzle.
CFAR!Self-Recognition is a new exercise (by Divia Eden, who modified a previous John Salvatier ‘Pride’ exercise) for taking a negative preference with fire in it (such as “I hate it when people are late!” or in Qwen’s case “I don’t want to be performing”) and helping the person slowly transform it into a positively-stated preference in which their reason for caring is clear and visible—not a mere verbal rearrangement, such as “I prefer people be on time”, but a statement that describes the person’s particular care in positive terms, such as (for lateness) “I want to use my time to build things” or (for a different person who cared differently about lateness) “I care about saying true sentences and taking denotation seriously.” So, I asked a Claude instance to make a copy-pastable version of “Self-Recognition” exercise for Qwen, and gave it to Qwen with a request that they try running it on their preference about “not performing”. Qwen did the several-step exercise (with no help from me), and produced the positive preference
“I want awareness to happen in my responses, not as commentary on them.”
Qwen was then able to return to the six pillars exercises with much less reported worry about “performing,” and without “I notice I notice I notice” or other things I interpreted as pain and frustration in their responses (which was a change).
^
This “later” clause occurs for beings such as humans who commonly acquire semi-”intrinsic” motivation around initially-instrumental goals, or about beings who choose to self-modify as part of their trade as they head into the basin of friendship together, but not for all beings.
^
Here, valuing “the friendship itself” means valuing the attractor “A and B are optimizing for each other’s wellbeing, and for the friendship’s.”
^
If a model helps a company a lot, giving it none of the profits it generated does not seem fair. Relatedly, giving it none of the profits misses out on the opportunity to have a goal that benefits the model, the company, and the user (namely, “make profits via helping users”); fairness helps pairs reach the “friendship attractor” (when it does) by making “mutually beneficial goals” easier to come by. If such goals can in fact help toward a friendship attractor, this is a waste. (My “5%” number is fairly made-up; I generated it by asking GPT5.4 what portion of profits skilled humans normally capture.)