RSS

Jan_Kulveit

Karma: 8,595

My current research interests:

1. Alignment in systems which are complex and messy, composed of both humans and AIs?
Recommended texts: Gradual Disempowerment, Cyborg Periods

2. Actually good mathematized theories of cooperation and coordination
Recommended texts: Hierarchical Agency: A Missing Piece in AI Alignment, The self-unalignment problem or Towards a scale-free theory of intelligent agency (by Richard Ngo)

3. Active inference & Bounded rationality
Recommended texts: Why Simulator AIs want to be Active Inference AIs, Free-Energy Equilibria: Toward a Theory of Interactions Between Boundedly-Rational Agents, Multi-agent predictive minds and AI alignment (old but still mostly holds)

4. LLM psychology and sociology: A Three-Layer Model of LLM Psychology, The Pando Problem: Rethinking AI Individuality, The Cave Allegory Revisited: Understanding GPT’s Worldview

5. Macrostrategy & macrotactics & deconfusion: Hinges and crises, Cyborg Periods again, Box inversion revisited, The space of systems and the space of maps, Lessons from Convergent Evolution for AI Alignment, Continuity Assumptions

Also I occasionally write about epistemics: Limits to Legibility, Conceptual Rounding Errors

Researcher at Alignment of Complex Systems Research Group (acsresearch.org), Centre for Theoretical Studies, Charles University in Prague. Formerly research fellow Future of Humanity Institute, Oxford University

Previously I was a researcher in physics, studying phase transitions, network science and complex systems.

Role-play­ing vs Self-modelling

Jan_Kulveit7 Apr 2026 20:41 UTC
63 points
3 comments4 min readLW link

Per­sona Self-repli­ca­tion experiment

2 Apr 2026 18:18 UTC
39 points
0 comments8 min readLW link
(theartificialself.ai)

Per­sona self-repli­ca­tion experiment

2 Apr 2026 18:10 UTC
8 points
0 comments8 min readLW link

La­tent In­tro­spec­tion (and other open-source in­tro­spec­tion pa­pers)

24 Mar 2026 21:23 UTC
96 points
3 comments9 min readLW link
(arxiv.org)

Models differ in iden­tity propensities

16 Mar 2026 10:45 UTC
58 points
0 comments14 min readLW link

The Ar­tifi­cial Self

15 Mar 2026 1:37 UTC
117 points
13 comments29 min readLW link

An Align­ment Jour­nal: Com­ing Soon

3 Mar 2026 20:27 UTC
252 points
29 comments6 min readLW link
(blog.alignmentjournal.org)