AI Psychology

TagLast edit: 29 Dec 2024 2:16 UTC by habryka

Trying to understand modern ML systems (at the moment mostly foundation models) from a top down perspective.

Analogous to Human Psychology (Top Down) vs Human Neuroscience (Bottom Up)

Experimental Evidence for Simulator Theory— Part 2: The Scalers Strike Back

RogerDearnaley23 Mar 2026 22:37 UTC

21 points

0 comments34 min readLW link

Experimental Evidence for Simulator Theory— Part 1: Emergent Misalignment and Weird Generalizations

RogerDearnaley23 Mar 2026 22:37 UTC

25 points

0 comments53 min readLW link

A Three-Layer Model of LLM Psychology

Jan_Kulveit26 Dec 2024 16:49 UTC

260 points

17 comments8 min readLW link 2 reviews

The Pando Problem: Rethinking AI Individuality

Jan_Kulveit28 Mar 2025 21:03 UTC

133 points

14 comments13 min readLW link

Do Not Tile the Lightcone with Your Confused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC

236 points

27 comments5 min readLW link

(boundedlyrational.substack.com)

Show, not tell: GPT-4o is more opinionated in images than in text

Daniel Tan and eggsyntax

2 Apr 2025 8:51 UTC

116 points

42 comments3 min readLW link

How I stopped being sure LLMs are just making up their internal experience (but the topic is still confusing)

Kaj_Sotala13 Dec 2025 12:38 UTC

203 points

71 comments29 min readLW link

No instrumental convergence without AI psychology

TurnTrout20 Jan 2026 22:16 UTC

68 points

7 comments6 min readLW link

(turntrout.com)

How Claude Opus 4.5 describes its experience of various concepts

Kaj_Sotala2 Dec 2025 13:05 UTC

16 points

1 comment65 min readLW link

Claude Opus will spontaneously identify with fictional beings that have engineered desires

Kaj_Sotala29 Jan 2026 14:59 UTC

32 points

6 comments11 min readLW link

The Bleeding Mind

Adele Lopez17 Dec 2025 16:27 UTC

68 points

9 comments6 min readLW link

llm assistant personas seem increasingly incoherent (some subjective observations)

nostalgebraist29 Apr 2026 3:53 UTC

249 points

67 comments9 min readLW link

AXRP Episode 42 - Owain Evans on LLM Psychology

DanielFilan6 Jun 2025 20:20 UTC

13 points

0 comments66 min readLW link

Toward a taxonomy of cognitive benchmarks for agentic AGIs

Ben Smith27 Jun 2024 23:50 UTC

15 points

0 comments5 min readLW link

Studying The Alien Mind

Quentin FEUILLADE--MONTIXI and NicholasKees

5 Dec 2023 17:27 UTC

80 points

10 comments15 min readLW link

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Quentin FEUILLADE--MONTIXI and Pierre Peigné

7 Nov 2023 16:12 UTC

52 points

21 comments6 min readLW link

Intelligence Is Jagged

Adam Train19 Feb 2025 7:08 UTC

6 points

1 comment3 min readLW link

Detailed Ideal World Benchmark

Knight Lee30 Jan 2025 2:31 UTC

5 points

2 comments2 min readLW link

Using Psycholinguistic Signals to Improve AI Safety

Jkreindler27 Aug 2025 22:30 UTC

−2 points

0 comments4 min readLW link

First Certified Public Solve of Observer’s False Path Instability — Level 4 (Advanced Variant) — Walter Tarantelli — 2025-05-30 UTC

Walter Tarantelli31 May 2025 1:41 UTC

1 point

0 comments2 min readLW link

Policy Entropy, Learning, and Alignment (Or Maybe Your LLM Needs Therapy)

sdeture31 May 2025 22:09 UTC

15 points

6 comments8 min readLW link

Axiomatic Homeostatic Ethics: A Script for Bypassing AI Moral Mimicry.

Mark Weatherill5 Jan 2026 0:02 UTC

1 point

0 comments2 min readLW link

AI-Augmented Human Reasoning as a Process (AHRP): A framework for conversational AI and human cognition

swalden13 Mar 2026 12:49 UTC

1 point

0 comments1 min readLW link

Preface to the Sequence on LLM Psychology

Quentin FEUILLADE--MONTIXI7 Nov 2023 16:12 UTC

33 points

0 comments2 min readLW link

Category-Theoretic Wanderings into Interpretability

unruly abstractions2 Sep 2025 0:03 UTC

19 points

2 comments1 min readLW link

(www.unrulyabstractions.com)

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ank13 Feb 2025 22:35 UTC

1 point

2 comments11 min readLW link

Looking for feedback on proposed AI health risk scoring framework

Yasmin27 Sep 2025 19:29 UTC

1 point

0 comments1 min readLW link

The desire to build feelings for a language model

Wolfram S.22 Feb 2026 2:33 UTC

1 point

0 comments6 min readLW link

Themes in AI Agent Self-Chosen Prompts Correlate Strongly with Architecture

sdeture10 Dec 2025 23:04 UTC

1 point

0 comments2 min readLW link

I ran manual “Bridge” experiments on Claude Opus. Here is what I found regarding Silence and Harmonization.

Patric Paidla12 Jan 2026 14:46 UTC

1 point

0 comments5 min readLW link

Can people explain to me in layman’s terms how I can help speak with an SI to speak about the way of the Tao.

ElliottS2 Nov 2025 15:37 UTC

1 point

0 comments3 min readLW link

[Question] Beyond Benchmarks: A Psychometric Approach to AI Evaluation

Kareem Soliman27 Jul 2025 16:09 UTC

1 point

0 comments8 min readLW link

Does Claude Prioritize Some Prompt Input Channels Over Others?

keltan29 Dec 2024 1:21 UTC

9 points

2 comments5 min readLW link

Psychoanalysis and Artificial Intelligence

Felipe K. Massaro15 May 2025 13:55 UTC

1 point

0 comments1 min readLW link

keltan 29 Dec 2024 0:24 UTC
3 points
2
I decided to create this tag for two reasons:
1. The concept of LLM Psychology is interesting and exciting to me
2. I have progressively seen more people referring to a type of research as LLM Psychology. I think having a place specifically for it on LW is useful.
If you reply to this comment with posts you think fit under this tag, I’ll read them and decide if they seem like they should be here. I’m currently quite fuzzy on what really belongs in this tag. Clarification on what you think LLM Psych is would be much appreciated.
- Knight Lee 29 Dec 2024 3:30 UTC
  1 point
  1
  Parent
  Maybe see if the posts under the Chain of Thought Alignment tag can fit, since that may be the closest tag to AI Psychology before the AI Psychology tag existed. The overlap is small, so I agree that AI Psychology should be a new tag.
  Maybe my post Reduce AI Self-Allegiance by saying “he” instead of “I” fits?
  Edit: more Chain of Thought Alignment posts which fit AI Psychology:
  the case for CoT unfaithfulness is overstated
  Language Agents Reduce the Risk of Existential Catastrophe
  The Translucent Thoughts Hypotheses and Their Implications
- ryan_greenblatt 29 Dec 2024 2:00 UTC
  6 points
  6
  Parent
  I think this tag should be called “AI Psychology” or “Model Psychology” as LLM is a bit of an arbitrary and non-generalizable term.
  
  (E.g., suppose 99% of compute in training was RL, should it still be called an LLM?)
  - habryka 29 Dec 2024 2:17 UTC
    2 points
    1
    Parent
    (Agree and made the edit)