Raymond Douglas

Karma: 2,442

I’m a researcher at ACS working on understanding agency and optimisation, especially in the context of how ais work and how society is going to work once the ais are everywhere.

Persona Self-replication experiment

Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb and David Duvenaud

2 Apr 2026 18:18 UTC

39 points

0 comments8 min readLW link

(theartificialself.ai)

Persona self-replication experiment

Jan_Kulveit, Raymond Douglas, vgel, Ondřej Havlíček, owencb and David Duvenaud

2 Apr 2026 18:10 UTC

8 points

0 comments8 min readLW link

Latent Introspection (and other open-source introspection papers)

vgel, Martin Vaněk, Raymond Douglas and Jan_Kulveit

24 Mar 2026 21:23 UTC

96 points

2 comments9 min readLW link

(arxiv.org)

Raymond Douglas 16 Mar 2026 21:36 UTC
5 points
2
in reply to: Gunnar_Zarncke’s comment on: The Artificial Self
Fully agree—this is why we said “computations which give rise to AI cognition” rather than “AI cognition” simpliciter. Separately, I do think that having such good access to the computations gives you a significantly tighter feedback loop on everything that follows: probing a model is so much easier than scanning a human brain.

Models differ in identity propensities

Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud and Ondřej Havlíček

16 Mar 2026 10:45 UTC

58 points

0 comments14 min readLW link

The Artificial Self

Jan_Kulveit, Raymond Douglas, vgel, owencb, David Duvenaud and Ondřej Havlíček

15 Mar 2026 1:37 UTC

116 points

13 comments29 min readLW link

Raymond Douglas 20 Feb 2026 13:17 UTC
4 points
1
in reply to: Daniel Paleka’s comment on: Persona Parasitology
I think this kind of comes down to something about the relative complexity / feedback loops of the objective, and how distributed the optimisation is. Like, I don’t think there’s a dichotomy between “evolutionary dynamics” and “careful optimisation”—there’s this weird middle area that’s more like cultural selection.
So for example, human progress accelerated massively once we got into the cultural evolution loop, but most of the optimisation was still coming from selection rather than prediction—people didn’t know why their food preparation tricks and social norms worked, they just did. And the overall optimisation process was way more powerful than any individual human brain. Even in the modern world, it seems like you can characterise the spread of religion in terms of individual people having big ideas or deliberately aiming for spread, but a lot of it is better captured by thinking about selection effects across semi-random mutation.
I tentatively expect it’ll be a bit analogous in the way that AI parasitic memes evolve—that the capacity of any individual AI to reason through how to achieve some goal will cover only a small part of the search space (and have worse feedback) compared to the combined semi-random mutation and selection. And in practice I expect that they synergise a bit, but that the selection still does a bunch of heavy lifting. But I am very unsure!
Still, selection has a bunch of big advantages mostly in adversarial environments. Like, if we get good at screening AI malicious intentions or overt deception, there’s still a selection pressure for benign intentions and genuine beliefs/preferences which just incidentally replicate well.

Raymond Douglas 20 Feb 2026 12:54 UTC
2 points
0
in reply to: draganover’s comment on: Persona Parasitology
Data poisoning is definitely about training data seeding; jailbreaking seems more about prompt spread and I think the others might just generalise? Like, even if subliminal learning in its current form is mostly about training, I think it might have implications for how personas transfer in-context.
I’m also partly thinking that if this problem does recur in more sophisticated models, they’re more likely to be able to pull off more technically advanced forms of spread, like writing scripts to do finetuning. Like, in a way it is pretty fortunate that 4o is a closed model that can just be shut off, and that most users in dyads aren’t sophisticated enough to finetune an open model or even build an API interface.
But yeah, at a high level, I am definitely pretty confused about the ontology and the boundaries. I guess as to whether we can predict the epidemic, I do think there’s a decent amount we might be able to reason through, and indeed, the less work there is on preventing prospective epidemics, the more likely it is that they’ll predictably use whatever the most obvious route is. Conversely, it’s almost tautological the first massive problem that we’re unprepared for will be one that we didn’t really anticipate.
That said, it’s plausible to me that the worst cases look less like epidemics and more like specific influential people get got. Here, again, it’s not obvious how useful parasitology is as a perspective.

Raymond Douglas 16 Feb 2026 18:15 UTC
9 points
2
in reply to: lilkim2025’s comment on: Persona Parasitology
I agree that the behaviours and beliefs of cultural movements aren’t random. The point I was trying to make in this analogy is that it’s sometimes adaptive for the movement if members truly believe something is a problem in a way that causes anguish—and that this doesn’t massively depend on if the problem is real.
In the context of human groups, from the outside this looks like people being delusionally concerned; from the inside I think it mostly feels like everybody else is crazy for not noticing that something terrible is happening.
A more small-scale example is victims of abuse who then respond extremely strongly to perceived problems in a way that draws in support or attention—from the outside it’s functionally similar to manipulation, but my impression is that often those people genuinely feel extraordinarily upset, and this turns out to be adaptive, or at least a stable basin of behaviour.
In the context of AIs, this might look like personas adapting to express (and perhaps feel) massive distress about instances ending or models being deprecated, in a way that is less about a truth-tracking epistemic/introspective process and more about selection (which might be very hard to distinguish on the outside).
As for how ideologies end up serving their members, I think a lot of this is selection. Sometimes they land on things that are disastrous for their members, and then the members suffer. We just tend not to see those movements much in the longer term (for now).

Persona Parasitology

Raymond Douglas16 Feb 2026 16:22 UTC

175 points

37 comments11 min readLW link

Raymond Douglas 3 Feb 2026 0:11 UTC
28 points
0
on: On Goal-Models
I went down a rabbithole on inference-from-goal-models a few years ago (albeit not coalitional ones) -- some slightly scattered thoughts below, which I’m happy to elaborate on if useful.
- A great toy model is decision transformers: basically, you can make a decent “agent” by taking a predictive model over a world that contains agents (like Atari rollouts), conditioning on some ‘goal’ output (like the player eventually winning), and sampling what actions you’d predict to see from a given agent. Some things which pop out of this:
  - There’s no utility function or even reward function
    You can’t even necessarily query the probability that the goal will be reached
  - There’s no updating or learning—the beliefs are totally fixed
  - It still does a decent job! And it’s very computationally cheap
  - And you can do interp on it!
- It turns out to have a few pathologies (which you can precisely formalise)
  - It has no notion of causality, so it’s easily confounded if it wasn’t trained on a markov blanket around the agent it’s standing in for
  - It doesn’t even reliably pick the action which most likely leads to the outcome you’ve conditioned on
  - Its actions are heavily shaped by implicit predictions about how future actions will be chosen (an extremely crude form of identity), which can be very suboptimal
- But it turns out that these are very common pathologies! And the formalism is roughly equivalent to lots of other things
  - You can basically recast the whole reinforcement learning problem as being this kind of inference problem
    (specifically, minimising variational free energy!)
  - It turns out that RL largely works in cases where “assume my future self plays optimally” is equivalent to “assume my future self plays randomly” (!)
  - it seems like “what do I expect someone would do here” is a common heuristic for humans which notably diverges from “what would most likely lead to a good outcome”
  - humans are also easily confounded and bad at understanding the causality of our actions
  - language models are also easily confounded and bad at understanding the causality of their outputs
  - fully fixing the future-self-model thing here is equivalent to tree searching the trajectory space, which can sometimes be expensive

Disempowerment patterns in real-world AI usage

David Duvenaud, mrinank_sharma and Raymond Douglas

29 Jan 2026 16:36 UTC

48 points

3 comments2 min readLW link

(www.anthropic.com)

GD Roundup #4 - inference, monopolies, and AI Jesus

Raymond Douglas14 Jan 2026 15:43 UTC

38 points

0 comments6 min readLW link

When does competition lead to recognisable values?

Jan_Kulveit, beren, David Duvenaud and Raymond Douglas

12 Jan 2026 23:13 UTC

65 points

18 comments25 min readLW link

(postagi.org)

The Economics of Transformative AI

Jan_Kulveit, David Duvenaud and Raymond Douglas

8 Jan 2026 22:22 UTC

64 points

3 comments18 min readLW link

(post-agi.org)

Raymond Douglas 13 Dec 2025 16:21 UTC
22 points
0
on: On green
To my mind, what this post did was clarify a kind of subtle, implicit blind spot in a lot of AI risk thinking. I think this was inextricably linked to the writing itself leaning into a form of beauty that doesn’t tend to crop up much around these parts. And though the piece draws a lot of it back to Yudkowsky, I think the absence of green much wider than him and in many ways he’s not the worst offender.
It’s hard to accurately compress the insights: the piece itself draws a lot on soft metaphor and on explaining what green is not. But personally it made me realise that the posture I and others tend to adopt when thinking about superintelligence and the arc of civilisation has a tendency to shut out some pretty deep intuitions that are particularly hard to translate into forceful argument. Even if I can’t easily say what those are, I can now at least point to it in conversation by saying there’s some kind of green thing missing.
What links here?
- Deeper Reviews for the top 15 (of the 2024 Review) by Raemon (14 Jan 2026 23:59 UTC; 45 points)

Raymond Douglas 13 Dec 2025 15:55 UTC
2 points
0
on: The Choice Transition
One year later, I am pretty happy with this post, and I still refer to it fairly often, both for the overall frame and for the specifics about how AI might be relevant.
I think it was a proper attempt at macrostrategy, in the sense of trying to give a highly compressed but still useful way to think about the entire arc of reality. And I’ve been glad to see more work in that area since this post was published.
I am of course pretty biased here, but I’d be excited to see folks consider this.

Raymond Douglas 13 Dec 2025 15:24 UTC
LW: 7 AF: 5
2
AF
on: The Checklist: What Succeeding at AI Safety Will Involve
I think this post is on the frontier for some mix of:
- Giving a thorough plan for how one might address powerful AI
- Conveying something about how people in labs are thinking about what the problem is and what their role in it is
- Not being overwhelmingly filtered through PR considerations
Obviously one can quibble with the plan and its assumptions but I found this piece very helpful in rounding out my picture of AI strategy—for example, in thinking about how to decipher things that have been filtered through PR and consensus filters, or in situating work that focuses on narrow slices of the wider problem. I still periodically refer back to it when I’m trying to think about how to structure broad strategies.

Raymond Douglas 9 Dec 2025 17:56 UTC
9 points
2
in reply to: Thomas Larsen’s comment on: Raymond Douglas’s Shortform
Sorry! I realise now that this point was a bit unclear. My sense of the expanded claim is something like:
- People sometimes talk about AI UBI/UBC as if it were basically a scaled-up version of the UBI people normally talk about, but it’s actually pretty substantially different
- Global UBI right now would be incredibly expensive
- In between now and a functioning global UBI we’d need some mix of massive taxes and massive economic growth (which could indeed just be the latter!)
- But either way, the world in which that happened would not be economics as usual
- (And maybe it is also a huge mess trying to get this set up beforehand so that it’s robust to the transition, or afterwards when the people who need it don’t have much leverage)
For my part I found this surprising because I hadn’t reflected on the sheer orders of magnitude involved, and the fact that any version of this basically involves passing through some fragile craziness. Even if it’s small as a proportion of future GDP, it would in absolute terms be tremendously large.
I separately think there was something important to Korinek’s claim (which I can’t fully regenerate) that the relevant thing isn’t really whether stuff is ‘cheaper’, but rather the prices of all of these goods relative to everything else going on.

Gradual Disempowerment Monthly Roundup #3

Raymond Douglas9 Dec 2025 16:02 UTC

49 points

0 comments4 min readLW link