ACCount comments on Loki zen’s Shortform

ACCount 16 Jul 2025 19:00 UTC
2 points
1
I think you might be in luck—in that there are things about LLMs that may be worth learning, and that are neither programming nor mathematics.
There is a third domain, the odd one out. It’s what you could call “LLM psychology”. A new, strange and somewhat underappreciated field that tries to take a higher-level look on how LLMs act and why. Of ways in which they are alike to humans and ways in which they differ. It’s a high weirdness domain that manages to have one leg in interpretability research and another in speculative philosophy, and spans a lot in between.
It can be useful. It’s where a lot of actionable insights like “LLMs are extremely good at picking up context cues from the data they’re given”, and “which is why few-shot prompts work so well” or “which is why they can be easily steered by vibes in their context, and at times in unexpected ways” can be found.
Dunno what exactly to recommend you, I don’t know what you already know, but off the top of my head:
- The two Anthropic interpretability articles—they look scary, and the long versions go into considerable detail, but they are comprehensible to a layman, and may have the kind of information you’re looking for:
  - Mapping the Mind of a Large Language Model (and the long version, Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet)
  - Tracing the Thoughts of a Large Language Model (and the long version, On the Biology of a Large Language Model)
- In a more esoteric vein—“the void” by nostalgebraist, which was well received on LW. Much more opinionated and speculative, but goes into a good amount of detail on how LLMs are made and how they function, in a way that more “mechanical” and math-heavy or code-heavy articles don’t. I can certainly endorse the first half of it.
- Loki zen 17 Jul 2025 6:51 UTC
  1 point
  0
  Parent
  Thank you! Yes I realise I didn’t make it very clear where I was starting from, I’ve read some of these but not all.
  In general I’ve found Anthropic’s publications to be very good at not being impenetrable to non programmers where they don’t have to be