James Diacoumis

Karma: 110

Lead Data Scientist at Quantium.

PhD in Theoretical Physics / Cosmology.

Views my own not my employers.

The Chinese Room re-visited: How LLM’s have real (but different) understanding of words

James Diacoumis24 Sep 2025 14:06 UTC

6 points

0 comments9 min readLW link

(jamesdiacoumis.substack.com)

James Diacoumis 15 Sep 2025 7:32 UTC
1 point
0
on: What, if not agency?
While I think reference problems do defeat specific arguments a computational-functionalist might want to make, I think my simulated upload’s references can be reoriented with only a little work. I do not yet see the argument for why highly capable self-preservation should take particularly long for AIs to develop.
I think you’re spot on with this. If you gave an AI system signals tied to e.g. CPU temperature, battery health etc… and train it with objectives that make those variables matter it will “care” about them in the same causal-role functional sense as the sim cares about simulated temperature.

This is a consequence of teleosemantics (which I can see is a topic you’ve written a lot about!)

James Diacoumis 4 Sep 2025 19:52 UTC
3 points
0
on: If I imagine that I am immune to advertising, what am I probably missing?
The idea that advertising needs to be strongly persuasive to work is a deeply embedded myth based on a misunderstanding of consumer dynamics. It instead works as a kind of ‘nudge’ for consumers in a particular direction.
In practice, most consumers are not 100% loyal to a particular brand so they don’t need to be strongly persuaded to move to a different brand. They typically have a repertoire of safe products that they’re cycling through based on which price promotions are available that week etc.. the goal is to ‘nudge’ them to buy your product somewhat more often within that repertoire, reinforce your products place in the repertoire and potentially get customers to trial it in their repertoire.
See the paper here and the relevant quote which puts it much more eloquently than I can:
There is instead scope for advertising to
(1) reinforce your brand’s customers’ existing propensities to buy it as one of several,
(2) ‘nudge’ them to perhaps buy it somewhat more often, and
(3) get other consumers perhaps to add your brand as an extra or substitute brand to their existing brand repertoire (first usually on a ‘trial’ basis - ‘I might try that’ - rather than already strongly convinced or converted)

James Diacoumis 28 Aug 2025 11:34 UTC
4 points
3
on: Will Any Crap Cause Emergent Misalignment?
Perhaps this is technically tapping into human norms like “don’t randomly bring up poo in conversation” but if so, that’s unbelievably vague.
I think this explanation is likely correct on some level.
I made a post here which goes into more detail but the core idea is that there’s no “clean” separation between normative domains like aesthetic, moral and social etc… and the model needs to learn about all of them through a single loss function so everything gets tangled up.

The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC

3 points

0 comments5 min readLW link

James Diacoumis 26 Aug 2025 22:40 UTC
9 points
1
on: Aesthetic Preferences Can Cause Emergent Misalignment
This is a super interesting result!

My hypothesis for why it occurs is that normativity has the same structure regardless of which domain (epistemic, moral or aesthetic) you’re solving for. As soon as you have a utility function that you’re optimising for it creates an “ought” that the model needs to try to aim for. Consider the following sentences:
- Epistemic: You ought to believe the General Theory of Relativity is true.
- Moral: You ought not to act in a way that causes gratuitous suffering.
- Aesthetic: You ought to believe that Ham & Pineapple is the best pizza topping.
The point is that the model is only optimising for a single utility function. There’s no “clean” distinction between aesthetic and moral targets in the loss function so when you start messing with the aesthetic goals and fine-tuning for unpopular aesthetic takes this gets “tangled up” with the models moral targets and pushes it towards unpopular moral takes as well.
What links here?
- The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled by James Diacoumis (28 Aug 2025 11:26 UTC; 3 points)

James Diacoumis 10 Aug 2025 10:57 UTC
3 points
0
in reply to: sunwillrise’s comment on: Against functionalism: a self dialogue
As a clarification, I’m working with the following map:
1. Abstract functionalism (or computational functionalism) - the idea that consciousness is equivalent to computations or abstractly instantiated functions.
2. Physical functionalism (or causal-role functionalism) - the idea that consciousness is equivalent to physically instantiated functions at a relevant level of abstraction.
I agree with everything you’ve written against 1) in this comment and the other comment so will focus on defending 2).
If I understand the crux of your challenge to 2), you’re essentially saying that once we admit physical instantiation matters (e.g. cosmic rays can affect computations, steel vs birds wings have different energy requirements) then we’re on a slippery slope because each physical difference we admit further constrains what counts as the “same function” until we’re potentially only left with the exact physical system itself. Is this an accurate gloss of your challenge?
Assuming it is, I have a couple of responses:
I actually agree with this to an extent. There will always be some important physical differences between states unless they’re literally physically identical at a token level. The important thing is to figure out which level of abstraction is relevant for the particular “thing” we’re trying to pin down. We shouldn’t commit ourselves to insisting that systems which are not physically identical can’t be grouped in a meaningful way.
On my view, we can’t need an exact physical duplicate to reflect presence/absence of consciousness because consciousness is so remarkably robust. The presence of consciousness persists over multiple time-steps in which all manner of noise, thermal fluctuations and neural plasticity occur. What changes is the content/character of consciousness—but consciousness persists because of robust higher-level patterns not because of exact microphysical configurations.
And maybe, just maybe, you need to consider what the physical substrate actually does instead of writing down imperfect abstract mathematical approximations of it.
Again, I agree that not every physical substrate can support every function (I gave the example of combustion not being supported in steel above.) If the physical substrate prevents certain causal relations from occurring then this is a perfectly valid reason for it not to support consciousness. For example, I could imagine that it’s physically impossible to build embodied robot AI systems which pass behavioural tests for consciousness because the energy constraints don’t permit it or whatever. My point is that in the event where such a system is physically possible then it is conscious.
To determine if we actually converge or if there’s a fundamental difference in our views: Would you agree that if it’s possible in principle to build a silicon replica of a brain at whatever the relevant level of abstraction for consciousness is (whether coarse-grained functional level, neuron-level, sub-neuron level or whatever) then the silicon replica would actually be conscious?
If you agree here, or if you insist that such a replica might not be physically possible to build then I think our views converge. If you disagree then I think we have a fundamental difference about what constitutes consciousness.

James Diacoumis 10 Aug 2025 4:56 UTC
1 point
0
in reply to: Algon’s comment on: Against functionalism: a self dialogue
I think the physical functionalist could go either way on whether a physically embodied robot wouldn’t be conscious.

Just clarifying this. A physical functionalist could coherently maintain that it’s not possible to build an embodied AI robot because physics doesn’t allow it. Similar to how a wooden rod can burn but a steel rod can’t because of the physics. But assuming that it’s physically possible to build an embodied AI system which passes behavioural tests of consciousness e.g. self-recognition, cross-modal binding, flexible problem solving etc.. then the physical functionalist would maintain that the system is conscious.
I think looking at how neurons actually work would probably resolve the disagreement between my inner A and S. Like, I do think that if we knew that the brain’s functions don’t depend on sub-neuron movements, then the neuron-replacement argument would just work
Out of interest, do you or @sunwillrise have any arguments or intuitions that the presence or absence of consciousness turns on sub-neuronal dynamics?
Consciousness appears across radically different neural architectures; octopuses with distributed neural processing in their arms, birds with a nucleated brain structure called the pallium which differs from the human cortex but has similar functional structure, even bumblebees are thought to possess some form of consciousness with far fewer neuron counts than humans. These examples exhibit coarse-grained functional similarities with the human brain—but differ substantially at the level of individual neurons.
If sub-neuronal dynamics determined presence or absence of consciousness we’d expect minor perturbations to erase it. Instead we’re able to lesion large brain regions whilst maintaining consciousness. You also preserve consciousness when small sub-neuronal changes are applied to every neuron such as when someone takes drugs like alcohol or caffeine. Fever also alters reaction rates and dynamics in every neuron across the brain. This robustness indicates that presence or absence of consciousness turns on coarse-grained functional dynamics rather than sub-neuronal dynamics.

James Diacoumis 9 Aug 2025 21:15 UTC
4 points
0
on: Against functionalism: a self dialogue
I found this post pretty helpful to crystallise two distinct views that often get conflated. I’ll call them abstract functionalism and physical functionalism. The key confusion comes from treating these as the same view.
When we talk about a function it can be instantiated in two ways: abstractly and physically. On this view there’s a meaningful difference between an abstract instantiation of a function, such as a disembodied truth table representing a NAND gate and a physical instantiation of a NAND gate e.g. on a circuit board with wires and voltages etc..
When S argues:
The causal graph of a bat hitting a ball might describe momentum and position, but if you re-create that graph elsewhere (e.g. on a computer or some scaled) it won’t have that momentum or velocity
They’re right that abstract function leaves out some critical physical properties. A simulation of momentum transfer doesn’t actually transfer momentum. But this doesn’t defeat functionalism it just shows that abstract instantiation of the function is not enough.
For example, consider a steel wing and a birds wing generating lift. The steel wing has vastly different kinetic energy requirements but the aerodynamics still works because steel can support the function. Contrast this with combustion—steel can’t burn like wood because it lacks the right chemical energy profile.
When A asks:
Do you claim that, if I started replacing neurons in your brain with stuff that is functionally the same, wrt. the causal graph of consciousness, you’d feel no difference? You’d still be conscious in the same way?
They’re appealing to the intuition that physically instantiated functional replicas of neurons would preserve consciousness.
The distinction matters because people often use the “simulations lack physical properties” argument to dismiss abstract functionalism and then tie themselves in knots trying to understand whether a physically embodied AI robot system could be conscious when they haven’t defeated physical functionalism.

James Diacoumis 22 Jul 2025 22:47 UTC
8 points
0
on: Moral realism—basic Q
The most coherent formulation that I’ve seen is from Terence Cuneo’s The Normative Web. The basic idea is that moral norms have the same ontological status as epistemic norms.

Unpacking this a little, when we’re talking about epistemic norms we’re making a claim about what someone ought to believe. For example:
- You ought to believe the Theory of General Relativity is true.
- You ought not to believe that there is a dragon in your garage if there is no evidence.
When we say ought in the sentences above we don’t mean it in some empty sense. It’s not a matter of opinion whether you ought to form beliefs according to good epistemic practices. The statements have some normative bite to them. You really ought to form beliefs according to good epistemic practices.
Similarly, you could cast moral norms in a similar vein. For example:
- You ought to behave in a way which promotes wellbeing
- You ought not to behave in a way which causes gratuitous suffering.
The moral statements above have the same structure as the epistemic statements. When I say you really ought not to believe epistemically unjustified thing X this is the same as saying you really ought not to behave in morally unjustified way Y.

There are some objections to the above:
- You could argue that epistemic norms reliably track truth whereas moral norms reliably track something else like wellbeing which you need an additional evaluative function to tell you is “good.”
The point is that you also technically need this for epistemic norms. Some really obtuse person could always come along and ask you to justify why truth-seeking is “good” and you’d have to rely on some external evaluation that seeking truth is good because XYZ.
- The standard formulation of epistemic and moral norms is “non-naturalist” in the sense that these norms cannot be deduced from natural facts. This is a bit irksome if we have a naturalist worldview and want to avoid positing any “spooky” entities.
Ultimately I’m pretty skeptical that we need these non-natural facts to ground normative facts. If what we mean by really ought in the above are that there are non-natural normative facts that sit over-and-above the natural facts then maybe the normative statements above don’t really have any “bite” to them. As noted in some of the other comments, the word really is doing a lot of heavy lifting in all of this.

James Diacoumis 8 Jul 2025 21:05 UTC
3 points
0
in reply to: eggsyntax’s comment on: On the functional self of LLMs
Makes sense—I think this is a reasonable position to hold given the uncertainty around consciousness and qualia.

Thanks for the really polite and thoughtful engagement with my comments and good luck with the research agenda! It’s a very interesting project and I’d be interested to see your progress.

James Diacoumis 8 Jul 2025 3:57 UTC
2 points
0
in reply to: eggsyntax’s comment on: On the functional self of LLMs
Possibly by ‘functional profile’ you mean something like what a programmer would call ‘implementation details’, ie a change to a piece of code that doesn’t result in any changes in the observable behavior of that code?
Yes, this is a fair gloss of my view. I’m referring to the input/output characteristics at the relevant level of abstraction. If you replaced a group of neurons with silicon that perfectly replicated their input/output behavior, I’d expect the phenomenology to remain unchanged.
The quoted passage sounds to me like it’s saying, ‘if we make changes to a human brain, it would be strange for there to be a change to qualia.’ Whereas it seems to me like in most cases, when the brain changes—as crudely as surgery, or as subtly as learning something new—qualia generally change also.
Yes, this is a great point. During surgery, you’re changing the input/output of significant chunks of neurons so you’d expect qualia to change. Similarly for learning you’re adding input/output connections due to the neural plasticity. This gets at something I’m driving at. In practice, the functional and phenomenal profiles are so tightly coupled that a change in one corresponds to a change in another. If we lesion part of the visual cortex we expect a corresponding loss of visual experience.
For this project, we want to retain a functional idea of self in LLM’s while remaining agnostic about consciousness, but, if this genuinely captures some self-like organisation, either:
1. It’s implemented via input/output patterns similar enough to humans that we should expect associated phenomenology, or
2. It’s implemented so differently that calling it “values,” “preferences,” or “self” risks anthropomorphism
If we want to insist the organisation is genuinely self-like then I think we should be resisting agnosticism about phenomenal consciousness (although I understand it makes sense to bracket it from a strategic perspective so people take the view more seriously.)

James Diacoumis 7 Jul 2025 23:41 UTC
4 points
0
on: On the functional self of LLMs
Interesting post!
But ‘self’ carries a strong connotation of consciousness, about which this agenda is entirely agnostic; these traits could be present or absent whether or not a model has anything like subjective experience^[6]. Functional self is an attempt to point to the presence of self-like properties without those connotations. As a reminder, the exact thing I mean by functional self is a persistent cluster of values, preferences, outlooks, behavioral tendencies, and (potentially) goals.
I think it makes sense strategically to separate the functional and phenomenal aspects of self so that people take the research agenda more seriously and don’t automatically dismiss it as science fiction. But I don’t think this makes sense fundamentally.
In humans, you could imagine replacing the functional profile of some aspect of cognition with no impact to experience. Indeed, it would be really strange to see a difference in experience with a change in functional profile as it would mean qualia could dance without us noticing. As a result, if the functional profile is replicated at the relevant level of detail in an artificial system then this means the phenomenal profile is probably replicated too. Such a system would be able to sort e.g. red vs blue balls and say things like “I can see that ball is red” etc…

I understand you’re abstracting away from the exact functional implementation by appealing to more coarse-grained characteristics like values, preferences, outlooks and behaviour but if these are implemented in the same way as they are in humans then they should have a corresponding phenomenal component.

If the functional implementation differs so substantially in AI that it removes the associated phenomenology then this functional self would differ so substantially from the human equivalent of a functional self that we run the risk of anthropomorphising.

Basically there are 2 options:
1. The AI functional self implements functions that are so similar to human functions that they’re accompanied by an associated phenomenal experience.
2. The AI functional self implements functions that are so different to human functions that we’re anthropomorphising by calling them the same things e.g. preferences, values, goals, behaviours etc…

James Diacoumis 3 Jul 2025 7:48 UTC
2 points
−2
in reply to: Kaj_Sotala’s comment on: Kaj’s shortform feed
This is totally valid. Neuron count is a poor, noisy proxy for conscious experience even in human brains.
See my comment here. The cerebellum is the human brain region with the highest neuron count, but people born without a cerebellum don’t have any impact to their conscious experience. It only affects motor control.

James Diacoumis 1 Jul 2025 21:23 UTC
1 point
0
in reply to: habryka’s comment on: Don’t Eat Honey
At least in my theory of mind it is clear that you need to understand what is going on inside of a mind to get strong evidence.
in-particular in my opinion you really want to gather behavioral evidence to evalute how much stuff there is going on in the brain of whatever you are looking at, like whether you have complicated social models and long-term goals and other things)
I agree strongly with both of the above points—we should be supplementing the behavioural picture by examining which functional brain regions are involved and whether these functional brain regions bear similarities with regions we know to be associated with consciousness in humans (e.g. the pallium in birds bears functional similarity with the human cortex).
Your original comment calls out that neuron counts are “not great” as a proxy but I think a more suitable proxy would be something like functional similarity + behavioural evidence.
(Also edited the original comment with citatons)

James Diacoumis 1 Jul 2025 20:56 UTC
1 point
0
in reply to: habryka’s comment on: Don’t Eat Honey
I currently think neuron count is a much better basis for welfare estimates than the RP welfare ranges (though it’s still not great
I agree that neuron count carries some information as a proxy for consciousness or welfare, but it seems like a really bad and noisy one that we shouldn’t place much weight on. For example, in humans the cerebellum is the brain region with the largest neuron count but it has nothing to do with consciousness.
It’s not clear to me that a species which showed strong behavioural evidence of consciousness and valenced experience should have their welfare strongly discounted using neuron count.

(To be clear, I get that your main problem with RP is the hedonic utilitarianism assumption which is a fair challenge. I’m mainly challenging the appeal to neuron count.)
EDIT: Adding some citations since the comment got a reaction asking for cites.
This paper describes a living patient born without a cerebellum. The effect of being born without a cerebellum leads to impaired motor function but no impact to sustaining a conscious state.
Neuron counts in this paper put the cerebellum around ~70 billion neurons and the cortex (associated with consciousness) around ~15 billion neurons.
What links here?
- James Diacoumis's comment on Kaj’s shortform feed by Kaj_Sotala (3 Jul 2025 7:48 UTC; 2 points)

James Diacoumis 16 Jun 2025 0:40 UTC
1 point
0
in reply to: Lorec’s comment on: The Boat Theft Theory of Consciousness
Ok interesting, I think this substantially clarifies your position.
I’m a bit puzzled why you would reference a specific study on octopuses, honestly, when cats and squirrels cry out all the time in what appears obviously-to-humans to be pain or anger.
Two reasons:
1. It just happened to be a paper I was familiar with, and;
2. I didn’t fully appreciate how willing you’d be to run the argument for animals more similar to humans like cats or squirrels. In retrospect, this is pretty clearly implied by your post and the link from EY you posted for context. My bad!
I don’t think it “has emotions” in the way we mean that when talking to each other.
I grant that animals have substantially different neurological structure to humans. But I don’t think this implies that what’s happening when they’re screaming or reacting to averse stimuli is so foreign we wouldn’t even recognise it as pain and I really don’t think this implies that there’s an absence of phenomenal experience.
Consider a frog snapping its tongue at an object it thinks is a fly. It obviously has a different meaning for [fly] than humans have—a human would never try to eat the fly! But I’d argue the concept of a fly as [food] for the frog overlaps with the concept of [food] for the human. We’re both eating through our mouths, eating to maintain nutrition, normal bodily functioning, because we get hungry etc… the presence of all these evolutionary selected functions are what it means for the system to consider something as [food] or to consider itself [hungry]. Just as the implementation of a negatively affective valenced response, even if different in its specific profile in each animal, is closely related enough for us to call it [pain].
In the study I linked the octopus is:
1. Recalling the episode where they were exposed to averse stimuli.
2. Binding it to a spatial context e.g. a particular chamber where it occurred
3. Evaluating analgesic states as intrinsically good
If the functional profile of pain is replicated—what grounds do we have to say the animals are not actually experiencing pain phenomenally?
I think where we fundamentally differ is on what level of self-modeling is required for phenomenal experience. I find it plausible that some “inner-listener” might be required for experiences to register phenomenally, but I don’t think the level of self-modelling required is so sophisticated. Consider that animals navigating their environment must have some simple self-model—to coordinate limbs, avoid obstacles etc.. These require representing [self] vs [world] and tracking what’s good or bad for me.
All this said, I really liked the post. I think the use-mention distinction is interesting and a pretty good candidate for why sophisticated self-modelling evolved in humans. I’m just not convinced on the link to phenomenal consciousness.

James Diacoumis 14 Jun 2025 11:39 UTC
3 points
0
in reply to: Said Achmiz’s comment on: The Boat Theft Theory of Consciousness
To be clear, I’m using the term phenomenal consciousness in the Nagel (1974) & Block (1995) sense that there is something it is like to be that system.
Phenomenal consciousness (i.e., conscious self-awareness)
Your reply equates phenomenal consciousness with conscious self-awareness which is a stronger criterion to how I’m using it. To clarify what you mean by self-awareness could you clarify which definition you have in mind?
1. Body-schema self model—an embodied agent tracking the position and status of its limbs as it’s interacting with and moving about the world.
2. Counterfactual valence planning—e.g. the agent thinks “it will hurt”, “I’ll get food” etc.. when planning
3. Higher order thought—the agent entertains a meta-representation like “I am experiencing X”
4. Something else?
Octopuses qualify as self-aware under 1) and 2) from the paper I linked above—but no one claims they satisfy 3).
For what it’s worth, I tend away from the idea that 3) is required for phenomenal consciousness as I find Block’s arguments from phenomenal overflow compelling. But it’s a respected minority view in the philosophical community.

James Diacoumis 13 Jun 2025 23:53 UTC
1 point
0
on: The Boat Theft Theory of Consciousness
Interesting post! I have a couple of questions to help clarify the position:

1. There’s a growing body of evidence e.g. this paper that creatures like octopuses show behavioural evidence for an affective pain-like response. How would you account for this? Would you say they’re not really feeling pain in a phenomenal consciousness sense?
2. I could imagine an LLM-like system passing the threshold for the use-mention distinction in the post.(although maybe this would depend on how “hidden” the socially damning thoughts are e.g. if it writes out damning thoughts in its CoT but not in its final response does this count?) Would your model treat the LLM-like system as conscious? Or would it need additional features?

James Diacoumis 26 May 2025 2:31 UTC
5 points
4
in reply to: Said Achmiz’s comment on: The stakes of AI moral status
I think we’re reaching the point of diminishing returns for this discussion so this will be my last reply.
A couple of last points:
So please do not now pretend that I didn’t say that. It’s dishonest.
I didn’t ignore that you said this—I was trying (perhaps poorly) to make the following point:
The decision to punish creators is good (you endorse it) and is the way that incentives normally work. On my view, the decision to punish the creations is bad and has the incentive structure backwards as it punishes the wrong party.
My point is that the incentive structure is backwards when you punish the creation not that you didn’t also advocate for the correct incentive structure by punishing the creator.
I am saying that these two positions are quite directly related.
I don’t see where you’ve established this. As I’ve said repeatedly, the question of whether a system is phenomenally conscious is orthogonal to whether the system poses AI existential risk. You haven’t countered this claim.
Anyway, thanks for the exchange.

James Diacoumis

The Chi­nese Room re-vis­ited: How LLM’s have real (but differ­ent) un­der­stand­ing of words

The Other Align­ment Prob­lems: How epistemic, moral and aes­thetic norms get entangled

The Chinese Room re-visited: How LLM’s have real (but different) understanding of words

The Other Alignment Problems: How epistemic, moral and aesthetic norms get entangled