Ariel Cheng

Karma: 40

Ariel Cheng 16 Apr 2026 10:34 UTC
3 points
0
in reply to: Ashe Vazquez Nuñez’s comment on: Beliefs are Chosen to Serve Goals
I don’t see why agents should be seen as optimisers rather than as achieving some minimal conditions they are satisfied with. The second view seems more consistent both with actual human behaviour and with the concept of bounded rationality as a whole. The minimal conditions seem intuitively related to ensuring existence/propagation (e.g. drinking “enough” water, acquiring “enough” shelter, etc..), but I don’t have a more complete way to put it than that for the moment.
I like this perspective a lot and I think it is indeed more informative than the optimizey perspective wrt agents-that-we-currently-observe-exist. But I don’t expect this perspective to be informative if we build something that is very consequentialist/optimize-y (e.g. ASI).
Imo the best formal grounding for this intuition of agents being exist-y/satisfice-y perspective is FEP. And I do think ASI will be an active inference agent, but that doesn’t really preclude the possibility that it’s also optimize-y; active inference agents behave more and more like EU maximizers under some conditions (namely low ambiguity), and I (tentatively) expect these conditions to be met for ASI.
Some of my uncertainties around this:
- Maybe it’s possible to construct an agent that has high optimization power/capability, but is uncertain about what to optimize for. This would probably lead to it acting in less scary ways. Whether this is possible probably depends on the particular selection process the agent went through.
- I’m not sure how to think about parts and wholes, in relation to this. For example, can you have an optimize-y thing, that is itself made up of exist-y/satisfice-y things? Can you have an exist-y/satisfice-y thing, that is itself made up of optimizey things? I’m not sure how these kinds of agents compose/interact across scales.
I’m not sure if there’s a meaningful “general intelligence” knob that a process can choose to crank up — even one that nominally exists for exactly that purpose (like a project that tries to build an ASI). This could be cruxy?
Yes, I think it’s cruxy. Could you elaborate on your uncertainty? Even if you’re just sketching out very feathery intuitions.
And I’m curious—if you think this knob doesn’t really meaningfully exist, what do you think current frontier labs are doing/selecting for, and what do you think they’re trying to do/select for? (Like, for example, do you think they’re trying to crank up the general intelligence knob, and that this is a futile task––really, they’re cranking up some different, adjacent knob?)
I’d expect two ASIs with different goals to have similar abstractions about the world, but different abstractions where those abstractions involve themselves/some level of recursive modeling, since those are how internal goals are represented; i.e. imo mutual info between them is low in this case.
Thanks for probing on this! I’m not sure I endorse that strong of a claim anymore. Refining into something I’d endorse more:
- The two ASIs exist in the same world, and this world has underlying laws; these laws can be inferred at a sufficiently-high intelligence level, with sufficient data.
- But not every single law will be inferred by the two ASIs, because of boundedness. The laws that are inferred will probably depend on their goals. There’s a problem of relevance, here.
- But some laws are convergently useful to infer, no matter what goal you have, because there are a small set of bottlenecks to achieving your goals for a wide variety of goals (e.g. resource constraints).
- Predicting those convergently useful domains well will lead agents to a similar set of abstractions
The last point seems to be the most important point, but I’m not sure why I buy it. But I do buy it.

Ariel Cheng 15 Apr 2026 15:21 UTC
2 points
0
in reply to: Ashe Vazquez Nuñez’s comment on: Beliefs are Chosen to Serve Goals
because many “goals” I have in mind involve existence and not optimality conditions
Could you give some examples? This seems quite important. Re: existence, this is possibly related to what you’re thinking?
While writing, I conceptualised the external observer as part of a thought experiment in which non-embeddedness is allowed as a hermeneutic device (sorry that this wasn’t clear!). I think this can be a useful idea even if no possible external observer actually has this property, but it breaks down when the observer has itself to interact with the agent (i.e. in a game-theoretical setup).
I’m naturally (intuitively) suspicious of god’s-eye-views––thinking about observers that stand completely outside observed-systems––and much more inclined to thinking in interactive/embedded terms. I’m curious why/how you find this perspective fruitful and/or interesting.
Let’s say you’re a selction process designing an agent and choose betweeen “giving it” an architecture that is likely to encode abstraction A versus abstraction B^[3]. Your choice between A and B might depend on questions like what internal model and goals are fit for the agent’s environment. This means there’s a dependency between the way in which the agent will be intelligent (i.e. whether it learns abstraction A or B) and what its internal goals will be.
I see why this is true for e.g. a hawk and a bat, where the abstraction-capabilities in question are visual ability v.s. echolocation. I don’t see why this is true once the selection process cranks up the “general intelligence” knob (at least, assuming natural abstractions).
Information about one gives information about the other and vice-versa.
I’d expect two ASIs with different goals to have similar abstractions about the world, but different abstractions where those abstractions involve themselves/some level of recursive modeling, since those are how internal goals are represented; i.e. imo mutual info between them is low in this case.

Ariel Cheng 8 Apr 2026 7:32 UTC
4 points
1
on: Beliefs are Chosen to Serve Goals
This claim kills the orthogonality thesis stone-dead, since some goals don’t fit into world-models that are insufficiently complex.
From the LessWrong wiki page: “The strong form of the Orthogonality Thesis says that there’s no extra difficulty or complication in the existence of an intelligent agent that pursues a goal, above and beyond the computational tractability of that goal.”
I think it is fair to rule in conceptual complexity as part of computational tractability. So I believe you are responding to a different, more naïve version of orthogonality.
One might point out that sufficiently (super)intelligent agents will bypass this problem by simply being smart enough to represent any goal, but this doesn’t make sense for embedded agents that are always strictly simpler than their environments. So long as a being is smaller than their world and must compress that world in their internal model, it follows that some concepts will be literally too large to fit.
You might like this.
The contributions I hope to make with this post are firstly to advocate the development of an abstract science of selection that maps out these dependencies between goals and intelligence, and secondly to offer the revealed versus internal goal framing as being useful to that end.
I’ll attempt to paraphrase your argument before commenting on it further.
1. Define the following:
2. 1. Revealed goals (of some agent A) as how other agents model A’s goals, based on and with reference to what they can observe in the environment.^[1]
  2. Internal goals (of some agent A) as how A models A’s own goals, based on what A can observe in the environment and what A can observe about their own internal states.^[2]
3. Because embedded agents are smaller than their environments, there are some concepts that will not fit inside their world-models; there is thus some set of internal goals they cannot have; orthogonality operationalised as “more or less any level of intelligence could in principle be combined with more or less any final goal” is false. The orthogonality thesis might still be true of revealed goals, though.
4. Instead of choosing one conception or the other, we can also consider them together, and think about the relationship between them. For a certain class of agent––namely, agents which have gone through a selection process––they are related in the following way: internal goals, along with an agent’s intelligence/capabilities, are selected to “in service of” the revealed goals.
Now for my thoughts:
- A lot of what you’re saying reminds me of mesa-optimization. I think when you reference the internal / revealed distinction in (3), you are really talking about base-optimizer goal / mesa-optimizer goal, and terms are being conflated. So, in fact, I think there’s four kinds of goals you’re talking about:
- - [1] Base-optimization + revealed: What other agents model as the goal of the selection process that produced A
  - [2] Mesa-optimization + revealed: What other agents model as A’s goal
  - [3] Base-optimization + internal: What A models as the goal of the selection process that produced A
  - [4] Mesa-optimization + internal: What A models as A’s goal
- I’m actually not sure how much revealed goals in your sense gets around the embedded agency problem of the agent being smaller than the environment, since other agents can also only describe A with reference to their own abstractions (which are themselves smaller than the environment).
- Re: “In other words, the intelligence properties of the organism’s model and the goals it pursues are part of the same architecture that is fundamentally subservient to its external selection pressures.” I don’t see how “fundamentally subservient” is true. I think the problem inner alignment is pointing at is precisely that the mesa-optimizer goal might eventually “override” the base optimizer goal (e.g. humans inventing contraception).
- Just because internal goals, along with an agent’s intelligence/capabilities, are selected for together in the sense that they exist in the same agent (and the selection process is over the “whole” agent), does not mean they won’t be independent. Or at least, I don’t see how this necessarily follows.
That said:
- Broadly, I do agree that thinking carefully about selection––and how selection shapes agents––is important.
- I like the “taking two perspectives on A’s goals: [1] A’s own perspective; [2] other agents’ perspective on A” thing, and I think people should be thinking in this frame more.
1. ^
  Or, in FEP/actinf parlance, what they can infer about A based on observing everything beyond A’s Markov blanket.
2. ^
  Unlike other agents, A can observe states on both sides of its blanket (A can also observe some of A’s internal states).

Ariel Cheng 19 Mar 2026 18:35 UTC
5 points
0
on: Contra Anil Seth on AI Consciousness
I think that Anil Seth’s main intuitions around why LLMs can’t be conscious stem from the Free Energy Principle (FEP) - see, for example, section 4.0 onwards near the bottom of page 15 of “Conscious Artificial Intelligence and Biological Naturalism.” I disagree with this view quite strongly. In fact, I think that, if you think FEP is correct, you should think it’s more likely that LLMs are conscious because FEP is quite general and substrate-agnostic. Substrate does matter insofar as it gives rise to the certain structural properties that make FEP applicable: for example, the thing you’re modelling must have states that render the environment’s influence on it conditionally independent (sensory states) and states that render its influence on the environment conditionally independent (active states); it might also be important for the thing you’re modelling to have active states hidden from itself so that it must infer things about itself, and in doing so, update a self-representation (fulfilling these requirements makes the thing a “strange particle”, in the literature). Related.
I might write on this and flesh out my views more at some point; just thought you might find it interesting, because I think this really is the substance of his argument (as opposed to over-attribution, anti-computationalism, simulation is not instantiation, etc). It’s also the “positive” part of the argument, in the sense that the other arguments presented argue against computational functionalism, but don’t really argue for biological naturalism.

Ariel Cheng 24 Jan 2026 12:08 UTC
11 points
0
on: Irrationality as a Defense Mechanism for Reward-hacking
Unfortunately, the claim that evolutionarily fine-tuned priors do all the work to prevent internal reward-hacking seems lacking to me, because in practice we are uncertain about our own feelings and preferences. We don’t actually have locked-in, invariant preferences, and it’s unclear to me how active inference explains this; preferences are usually encoded as priors over observations, but ironically these are never updated.
I don’t think actinf proposes that all preferences are locked-in and invariant, just that the “deepest” priors—related to body temp., humidity, etc.---are (see Section 9). Imo if you’re talking about a deep (hierarchical) actinf agent, then all the forward predictions are kinda-preferences, to varying degrees; the slower to update, deeper layers are more preference-y and the faster to update layers closer to sensory input are more belief-y. There’s some interesting discussion of this here.
Active inference thus implicitly assumes agents to be consistently, definitively settled on their preferences.
So with that in mind, I think I’d disagree with this and agree with something more like “active inference assumes agents are settled in the preferences that are necessary to keep them alive, but not in the preferences that are necessary to bring those states about.”
That said, though, I think your overall idea is interesting. If you’re thinking about subagents and superagents in terms of actinf, you might want to check out this paper.
What links here?
- Towards Sub-agent Dynamics and Conflict by Ashe Vazquez Nuñez (25 Jan 2026 5:27 UTC; 13 points)

Ariel Cheng 13 Jan 2026 18:13 UTC
6 points
1
on: Understanding Agency through Markov Blankets
Thanks for writing this post! I am glad someone doing alignment is paying attention to Markov blankets.
Not everything that is statistically couched from the outside world by a boundary is sensibly described as an agent. For instance, a rock is a self-organised entity with reasonably clear boundaries. Moreover, these boundaries are generally more robust than those of living beings, lasting considerably longer.
I don’t think a rock’s boundary is particularly robust. If I smash a rock with a hammer, bye-bye rock-blanket. If I smash you with a hammer, you will be injured but still exist. (You are in general better at keeping your internal states within a certain set—i.e. maintaining your non-equilibrium steady states—such that your blanket persists through time, and you do this by acting on the world.)
If you are interested in distinguishing people from rocks, I think https://arxiv.org/pdf/2210.12761 is worth a look. See in particular figure 2 on page 14.
I think this undersells the interest of seeing agency as a fractal-like phenomenon that doesn’t fit a clear, discrete separation between agents and their environments.
https://www.pure.ed.ac.uk/ws/files/39856902/How_to_Knit_Your_Own_Markov_Blanket.pdf and https://arxiv.org/pdf/2502.21217 may be of interest.
And, not exactly what you are talking about, but somewhat related: I have recently been thinking about why Markov blankets change. Feel free to DM if you want to talk.

Ariel Cheng 30 Oct 2025 14:14 UTC
1 point
0
on: [Error communicating with LW2 server]
it’s
-> its

Ariel Cheng 30 Oct 2025 14:12 UTC
1 point
0
on: [Error communicating with LW2 server]
It amplifies the precision of internally generated activity, treating spontaneous (often related to that day’s memories) cortical/hippocampal patterns as high‑fidelity “data,” while weakened monoaminergic tone relaxes top‑down priors.
Woah this is quite cool

Ariel Cheng 30 Oct 2025 14:05 UTC
1 point
0
on: [Error communicating with LW2 server]
One claim of this post is that desires and epistemic priors are enconded in essentially the same way in the brain
I think it’s a good idea to introduce the basic story of how active inference represents preferences in the introduction of this post, i.e. C = p(o) because it’s quite different from how other paradigms do it. People are often confused by this IME. And walk through some kind of toy example for precision and updating and prediction errors overriding the “deeper” beliefs v.s. the “deeper” beliefs persisting. And then say smth like “the payoff for this weird mixing of is and ought is it lets you explain depression really damn well”, then get into the neuroscience

Also, you may be interested in: https://arxiv.org/abs/2509.17957 (basic idea here is we not only introduce preferences over observatioins p(o), but also over hidden state beliefs p(s), to model stuff like why people are reluctant to update their beliefs due to social reasons)

Ariel Cheng 30 Oct 2025 14:01 UTC
1 point
0
on: [Error communicating with LW2 server]
enconded
-> encoded

Ariel Cheng 30 Oct 2025 14:00 UTC
1 point
0
on: [Error communicating with LW2 server]
Since prediction error may be perceived as suffering/discomfort
There’s a lot of very interesting literature about how the (interoceptive) problem of feeling your emotions is also an inference problem: https://pmc.ncbi.nlm.nih.gov/articles/PMC5390700/

(So it’s just the FEP story again, with emotions as hidden states)

Ariel Cheng 30 Oct 2025 13:56 UTC
1 point
0
on: [Error communicating with LW2 server]
independent of the actual significance of the “wrongness” in one’s prior understanding of the given sensed situation.
I can’t parse this

Ariel Cheng 30 Oct 2025 13:52 UTC
1 point
0
on: [Error communicating with LW2 server]
“cognitive enhancer”
-> cognitive enhancers, (+remove quotation marks?)

Ariel Cheng 30 Oct 2025 13:47 UTC
1 point
0
on: [Error communicating with LW2 server]
In the 1950s, the correlation of increased central ACh and depression has been discovered, and in the 70s it has been formalised as the cholinergic-adrenergic hypothesis of mania and depression
“has been” → “was” in both cases

Ariel Cheng 30 Oct 2025 13:29 UTC
1 point
0
on: [Error communicating with LW2 server]
(or internally generated)
I think “internally generated” is confusing here, since by sensory I assume you mean the descending prediction errors as a result of the actual observable states differing from the predicted ones? maybe just scrap this phrase

Ariel Cheng 30 Oct 2025 13:24 UTC
1 point
0
on: [Error communicating with LW2 server]
same neurotransmitter
specify which one in the text?

Ariel Cheng 30 Oct 2025 13:23 UTC
1 point
0
on: [Error communicating with LW2 server]
Epistemic status: An attempt at a synthesis of the cholinergic theory of depression and the role of acetylcholine in the Active Inference theory of the brain, by a neuroscience layperson. I acknowledge that my understanding of the math behind FEP is also incomplete, but it seems to me that it’s worth writing out a potentially mathematically mistaken idea, rather than delaying shipping by continually getting sidetracked by deeper and deeper theory.
peak lesswrong mood

mundane, but also tag “predictive processing”?

Ariel Cheng 1 May 2025 1:42 UTC
8 points
0
in reply to: Yldedly’s comment on: The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as *evidence*
See https://arxiv.org/abs/2006.12964:
CAI augments the ‘natural’ probabilistic graphical model with exogenous optimality variables. 4 . In contrast, AIF leaves the structure of the graphical model unaltered and instead encodes value into the generative model directly. These two approaches lead to significant differences between their respective functionals. AIF, by contaminating the veridical generative model with value-imbuing biases, loses a degree of freedom compared to CAI which maintains a strict separation between the veridical generative model of the environment and its goals. In POMDPs, this approach results in CAI being sensitive to an ‘observation-ambiguity’ term which is absent in the AIF formulation. Secondly, the different methods for encoding the probability of goals – likelihoods in CAI and priors in AIF – lead to different exploratory terms in the objective functionals. Specifically, AIF is endowed with an expected information gain that CAI lacks. AIF approaches thus lend themselves naturally to goal-directed exploration whereas CAI mandates only random, entropy-maximizing exploration.
These different ways of encoding goals into probabilistic models also lend themselves to more philosophical interpretations. CAI, by viewing goals as an additional exogenous factor in an otherwise unbiased inference process, maintains a clean separation between veridical perception and control, thus maintaining the modularity thesis of separate perception and action modules (Baltieri & Buckley, 2018). This makes CAI approaches consonant with mainstream views in machine learning that see the goal of perception as recovering veridical representations of the world, and control as using this world-model to plan actions. In contrast, AIF elides these clean boundaries between unbiased perception and action by instead positing that biased perception (Tschantz, Seth, & Buckley, 2020) is crucial to adaptive action. Rather than maintaining an unbiased world model that predicts likely consequences, AIF instead maintains a biased generative model which preferentially predicts our preferences being fulfilled. Active-inference thus aligns closely with enactive and embodied approaches (Baltieri & Buckley, 2019; Clark, 2015) to cognition, which view the action-perception loop as a continual flow rather than a sequence of distinct stages.

Ariel Cheng 4 Mar 2025 15:38 UTC
2 points
0
in reply to: faul_sname’s comment on: faul_sname’s Shortform
This is kinda related: ‘Theories of Values’ and ‘Theories of Agents’: confusions, musings and desiderata

Ariel Cheng 26 Feb 2025 13:30 UTC
1 point
0
in reply to: Cole Wyeth’s comment on: Cole Wyeth’s Shortform
I think it would be persuasive to the left, but I’m worried that comparing AI x-risk to climate change would make it a left-wing issue to care about, which would make right-wingers automatically oppose it (upon hearing “it’s like climate change”).
Generally it seems difficult to make comparisons/analogies to issues that (1) people are familiar with and think are very important and (2) not already politicized.