Ashe Vazquez Nuñez

Karma: 885

Currently (May ’26) working on agent foundations as part of the MATS 9.1 extension program. I’m interested in self-models for embedded agents as a way to understand goals and beliefs. I otherwise occasionally write about math or its philosophy and sociology.

All writing is entirely my own unless explicitly stated otherwise

Ashe Vazquez Nuñez 4 May 2026 6:30 UTC
4 points
0
in reply to: Denis Weber’s comment on: How Go Players Disempower Themselves to AI
While the deep dive into Carlo Metta felt a bit like a personal crusade at times
I hesitated to publish this for exactly that reason. I was drawn to using this case as an example because I do actually think it affected the way AI use is perceived and handled in European/American Go culture for the worse. However, the topic is pretty sensitive among Go players and this makes it hard to discuss without eliciting monkey-politics-brain sentiments from everyone (inlcuding me as the writer). I ended up addressing the piece mostly to the AI crowd and chose not to widely publicise it among Go players.
I would be curious to know how I could brought up the example in a more tasteful way that wouldn’t have given the impression you describe.
On outsourcing of autonomy: I think there is a meaningful difference between the other examples you gave and Go AI. I agree that humans outsource their cognition to things all the time. I would call artefacts like my personal notes and my anki deck part of my extended mind. Extending minds is great, despite the perpetual risk of self-disempowerment it entails. However, most delegation used to happen between humans. AI has reached near-superhuman (or higher) level at many tasks that are a key part of how we share culture and resources (e.g. writing, code). This seems unusually dangerous because cultural, economic, and political power is slowly being transferred to increasingly intelligent entities that are unlikely to be aligned with human interests.

Ashe Vazquez Nuñez 4 May 2026 5:57 UTC
5 points
0
in reply to: Vladimir_Nesov’s comment on: How Go Players Disempower Themselves to AI
This is generally agreed upon to be the “right” way to study with AI and Go players often pay lip service to it. In practice people’s boundaries for what counts as on-policy distillation are not well-defined and that dillutes the impact. The mechanism whereby boundaries get weakened goes as follows:

A player finishes a game and usually has a narrative of what happened, which mistakes were pivotal, and what they need to improve. The AI’s evaluation will throw prediction error in the person’s model. Since the AI is understood to be unquestionably correct, the person often ‘has’ to update to the AI’s view. However, people can hack their their prediction-error sensors by retroactively updating what they think they believed in the past as well. This makes it extremely difficult to get useful feedback from the AI because all of it is ‘obvious’ or ‘natural’ after it is pointed out to you.
The above pattern seems extremely common and I personally struggle to overcome it. The best method I have found is to make a concrete, written narrative of my games. This includes recording the exact moves I think are mistakes, why they are mistakes, and how much I expect the AI to dislike them. This creates a more well-defined boundary of what your opinions actually were that helps you maintain some epistemic integrity. I would do this regularly if I were still playing Go (semi)-professionally.

Ashe Vazquez Nuñez 4 May 2026 5:33 UTC
3 points
0
in reply to: MC_Escherichia’s comment on: How Go Players Disempower Themselves to AI
I was indeed referring to Fox. I would have sworn there used to be a paid feature called “eagle-eye” (or something to that effect) that let you sneak a look at the AI evaluation of your live games. On the other hand, I can’t find it anymore. I’m wondering if I hallucinated/misunderstood it or if it was discontinued.

Ashe Vazquez Nuñez 2 May 2026 18:11 UTC
28 points
3
in reply to: LawrenceC’s comment on: How Go Players Disempower Themselves to AI
Go is an interesting model organism for disempowerment because its practice has both technical and artistic/cultural components. Go AI is indeed disanalogous to coding LLMs from a technical perspective: the Go AI has vastly superhuman competence and the LLM does not. However, as you pointed out, Leela zero and other engines don’t communicate off the board; moreover, their incredible strength actually makes them worse as game and review partners. This results in human-AI Go interactions that are usually hollow and vapid with respect to human-human ones, even if you factor out any cheating. AI is therefore bad at the cultural practice of Go and this shortcoming manifests in similar ways to those of old (maybe even current) coding assistants and LLMs as writers. People gravitate towards using the AI in all of these settings because it does a superficially “good enough” job at replacing them. However, in reality this offloading of cognition to the AI is illegitimate. In your example of coding assistants, the AI is not actually “good enough” on a technical level. At my school, the problem was that the valuable social and cultural exchange between players could not be replaced by their GPUs. Delegating one’s writing to LLMs suffers a bit from both of these problems.
As you pointed out, one antidote to this tendency of cognitive delegation involves being willing to languish in confusion: to continue to think even when it is uncomfortable or frustrating. Sufficiently responsible/thoughtful/robust people can thus benefit from AI usage, which is indeed likely easier with LLMs than with Go AI due to better communication. Relatedly, I think a path to an empowered society would involve system designs that counterbalance the “congitive offloading” tendency.
The patterns of disempowerment described above apply less strongly to professional players because they are selected to enjoy thinking about the game^[1]. That’s why I’m a little bit skeptical of how well your example of Chess players floundering straight out of prep applies^[2]. Chess players outsource much of their cognition to their memory (this was true before AI), which is a sensible competitive move. I suspect the occasional floundering comes from them dancing on the Pareto frontier between “amount of stuff memorised” and “ability to execute on stuff you memorised” which probably trade off against each other.
though it seems the case for chess play improving is stronger than for go, perhaps?
This is probably downstream of memorisation being relevant in Chess but not in Go.
Perhaps a more important question is, do you plan on writing more history of X posts?
Could you say more on what archetype of post you have in mind? I don’t think I can write a post quite like this one on many other topics because the narrative relies on lived experience. I am slowly cooking a “history of behavioural decision theory”, but that feels rather distinct.
1. ^
  Many AI users at my Go school were stronger (1 Dan to 4 Dan EGF) amateurs who were struggling to keep improving but who were still emotionally invested in watching their rank increase.
2. ^
  amateur Chess players were always disempowered with respect to their own prep anyway, though AI probably exacerbates this

Ashe Vazquez Nuñez 2 May 2026 17:47 UTC
17 points
1
on: How Go Players Disempower Themselves to AI
Short comment addressing some people remarkig similarities in my story to Chess culture:

I agree that Chess shows many similar patterns of disempowerment, but they seem to be better off along one relevant axis. Chess websites have automated algorithms for detecting cheating and are not scared to punish people based on their outputs. This significantly dampens AI use in online Chess. This contrasts with Go, where cheating is rarely punished and occasionally can even be enabled. One story I didn’t share in the main post is that the biggest Chinese Go server actually has a paid option to use AI in games inside the client. I have also seen students at Chinese Go schools being allowed to cheat during online practice games by teachers.

How Go Players Disempower Themselves to AI

Ashe Vazquez Nuñez1 May 2026 23:24 UTC

525 points

43 comments8 min readLW link

Ashe Vazquez Nuñez 22 Apr 2026 7:25 UTC
1 point
0
on: Ashe Vazquez Nuñez’s Shortform
LW react suggestion: “good paraphrase” (of something I said at a previous point in the thread)

Ashe Vazquez Nuñez 21 Apr 2026 21:19 UTC
1 point
0
in reply to: Ashe Vazquez Nuñez’s comment on: Beliefs are Chosen to Serve Goals
quick addendum: my point feels spiritually related to the idea that “convergent evolution” is an incomplete concept without a specification of the attractor basin.

Ashe Vazquez Nuñez 21 Apr 2026 21:16 UTC
1 point
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
I agree that FEP-shaped intuitions are very good for satisfice-ey agents. I’m unconvinced by the concrete mathematical modelling (notably not a fan of Bayesian generative models ) but I find the ideas conceptually useful if you abstract away the implementation.

My scepticism of general intelligence is closely related to your point that ASIs won’t infer every single law. Any given level of complexity in an organism can only acommodate a limited ontology. Of course, you can always “juice up” the agent and give it more resources so it learns a more textured world model. One pseudo-mathematical way to put this is that for every set of abstractions, there exists an abstraction that oblates all of them at once; for a fixed level of complexity however, there exist two sets of abstractions such that neither one clearly dominates.
Our crux might start at “some laws are convergently useful to infer”. One corollary of my last pseudo-mathematical claim is that any bounded agent has to “choose” between incomparable ontologies. The claim in my original post is that the revealed goals an agent is endowed with affects this choice. This amounts to advocating that a focus on the effect of selection pressures on learned abstractions will yield better predictions than a focus on finding “convergent” or “natural” abstractions.

Ashe Vazquez Nuñez 21 Apr 2026 20:54 UTC
1 point
0
in reply to: Samuel Ratnam’s comment on: Beliefs are Chosen to Serve Goals
why are the goals of other agents more likely to have natural convergent representations of them than other things in the world?

Ashe Vazquez Nuñez 20 Apr 2026 10:49 UTC
4 points
0
on: Ashe Vazquez Nuñez’s Shortform
Epistemic status: cold take that I found edifying to write up explicitly.
Abstraction allows you to efficiently talk about many things at once in exchange for it being harder to say things that are accurate. In category theory, for instance, one abstracts away the concept of elements of sets by only referring to “objects” that may or may not contain things. One then often ascends another step by defining morphisms between categories (functors) and then forgetting the categories to consider the functors themselves as objects of interest. Climbing these two layers of abstraction buys you an extraordinary amount of semantic efficiency but the resulting ontology is excruciating to reason about adequately.
I’ve been wondering why it is so much easier to climb and descend levels of abstraction using natural^[1] language instead of mathematical reasoning and would like to discuss two unoriginal theories that explain this. Firstly, one might conjecture that math is semantically limited by the restrictions imposed on it syntax; natural language is by contrast flexible which makes it meaningfully more expressive. As an example, consider how almost every fruitful approach to mathematising anything relies on the machinery of functions or relations. This limitation handicaps our ability to talk about embedded agents that lack well-defined i/o channels, and indeed we are currently stuck with a bunch of problematic mathematical frameworks (AIXI, decision/game theory, …).
A second explanation suggests that natural language allows seemingly frictionless transition across levels of abstraction through the abuse of underspecified syntax. In this view, natural language can only achieve the appearance of expressivity by obfuscating the lossiness of its communication. The logical ambiguity of natural language is broadly what has pushed mathematics culture towards heavier use of mathematical symbols, logical proof systems, and so on...
The above perspectives both have some merit and they seem to co-exist in a hermeneutic symbiosis. Natural language is used to generate an idea, then mathematical formalism is employed to sanity-check that idea and examine any implications that natural language may have failed to reveal. Unfortunately, this process can break down when a concept is either too semantically rich or syntactically abusive to be easily mathematisable. Moreover, it’s not always clear which of the two explanations apply (maybe both) when a framework proves hostile to being formalised. One meaningful axis along which researchers vary describes how much benefit of the doubt they are willing to give to these kinds of ambiguous theories.
1. ^
  no pun intended

Ashe Vazquez Nuñez 15 Apr 2026 17:30 UTC
2 points
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
On existence: I don’t see why agents should be seen as optimisers rather than as achieving some minimal conditions they are satisfied with. The second view seems more consistent both with actual human behaviour and with the concept of bounded rationality as a whole. The minimal conditions seem intuitively related to ensuring existence/propagation (e.g. drinking “enough” water, acquiring “enough” shelter, etc..), but I don’t have a more complete way to put it than that for the moment. I’ll check your rec out, thanks!
I’m also suspicious of god’s-eye-views. I think they can be conceptually clear and helpful on the one hand, but its unclear how much useful “realness” you trade for that clarity. I see them as training wheels that dominate in the interim as I seek a better, embedded perspective.
----
hawk vs. bat is exactly the type of example I had in mind. I don’t necessarily assume natural abstractions, and maybe relatedly I’m not sure if there’s a meaningful “general intelligence” knob that a process can choose to crank up — even one that nominally exists for exactly that purpose (like a project that tries to build an ASI). This could be cruxy? It might also relate to me not seeing why you’d expect two ASIs to develop similar emergent abstractions for their world-models in the following quote:
I’d expect two ASIs with different goals to have similar abstractions about the world, but different abstractions where those abstractions involve themselves/some level of recursive modeling, since those are how internal goals are represented; i.e. imo mutual info between them is low in this case.

Ashe Vazquez Nuñez 14 Apr 2026 17:45 UTC
2 points
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
Thanks for your comments. I reacted to some parts where I straightforwardly agree and address some more nuanced points below.
I agree that my argument feels mesa-optimiser-shaped; I avoided the terminology because many “goals” I have in mind involve existence and not optimality conditions. But indeed, the relationship between the base and mesa optimisers is roughly analogous to the one I have in mind between “revealed” and “internal” goals^[1]. “fundamentally subservient” is therefore a linguistic overreach that I retrospectively don’t endorse. As you pointed out, the mesa optimiser is spun up to achieve the base optimiser’s goals, but the power structure between them once the mesa optimiser exists can be complex and rich; I’m broadly interested in studying precisely this complexity.
I find the point you make about the conflation technically correct but don’t yet know how I feel about the four-goal typology it induces. While writing, I conceptualised the external observer as part of a thought experiment in which non-embeddedness is allowed as a hermeneutic device (sorry that this wasn’t clear!). I think this can be a useful idea even if no possible external observer actually has this property, but it breaks down when the observer has itself to interact with the agent (i.e. in a game-theoretical setup). Your distinction could definitely be useful for such cases at least.
Finally:
Just because internal goals, along with an agent’s intelligence/capabilities, are selected for together in the sense that they exist in the same agent (and the selection process is over the “whole” agent), does not mean they won’t be independent. Or at least, I don’t see how this necessarily follows.
I’m pointing out something significantly stronger than that the two things exist in the same agent; they are selected “to”^[2] achieve the same revealed goal. The intelligence develops to fit the internal goals that are useful for the revealed goal, and the internal goals are in turn limited by the shape of the intelligence. I think these limitations are much more informationally rich than can be chalked up to “just” being about compute/complexity bounds, though I failed to illustrate this in my original piece. Let’s say you’re a selction process designing an agent and choose betweeen “giving it” an architecture that is likely to encode abstraction A versus abstraction B^[3]. Your choice between A and B might depend on questions like what internal model and goals are fit for the agent’s environment. This means there’s a dependency between the way in which the agent will be intelligent (i.e. whether it learns abstraction A or B) and what its internal goals will be. Information about one gives information about the other and vice-versa.
1. ^
  I have occasionally used the term “mesa-satisficer” in the past.
2. ^
  asterrisk for base/mesa distinction again!
3. ^
  You could, at a higher complexity cost, endow it with an abstraction C that contains the ability to understand both A and B (which are themselves mutually incomparable), but this might not be the best way to allocate your resources.

Beliefs are Chosen to Serve Goals

Ashe Vazquez Nuñez7 Apr 2026 16:43 UTC

28 points

14 comments4 min readLW link

(tuesdaybornwhale.substack.com)

A Tale of Two Rigours

Ashe Vazquez Nuñez3 Apr 2026 21:28 UTC

22 points

0 comments2 min readLW link

(tuesdaybornwhale.substack.com)

Why natural transformations?

Ashe Vazquez Nuñez1 Apr 2026 23:59 UTC

17 points

0 comments2 min readLW link

Ashe Vazquez Nuñez 25 Mar 2026 21:39 UTC
3 points
0
in reply to: Jonas Hallgren’s comment on: Agents Can Get Stuck in Self-distrusting Equilibria
My aim is to work towards a paradigm where allying under a coherent identity is a demonstrably “reasonable” development for sub-agents or TIs subject to selection pressures. I agree that one of these pressures is likely predictability or, almost equivalently, a compressable and simple self-model.
I hint at what this could look like in the section “Robust equilibria and updatelessness”. The shape of identity (e.g. how updateless the agent allows itself to be) may dictate not only how TIs interact with each other, but also how they will navigate strategic interactions with TIs belonging to other agents. This would ideally mean that a theory of intrapersonal games in an agent determines what solution concepts are valid for that agent in usual game theory.
This addresses your point in the specific case where the rest of the world consists of other agents; I’m not as clear on the relationship intrapersonal games have to “world models”. Food for thought, thanks.

Agents Can Get Stuck in Self-distrusting Equilibria

Ashe Vazquez Nuñez24 Mar 2026 22:05 UTC

32 points

2 comments12 min readLW link

An Informal Definition of Goals for Embedded Agents

Ashe Vazquez Nuñez24 Mar 2026 18:36 UTC

17 points

0 comments1 min readLW link

Ashe Vazquez Nuñez 22 Mar 2026 5:36 UTC
1 point
0
on: Terrified Comments on Corrigibility in Claude’s Constitution
Unnatural, because we expect the AI to resist having its mind changed: rational agents should want to preserve their current preferences, because letting their preferences be modified would result in their current preferences being less fulfilled
This is plausible for some idealistic rational agent, but seems unlikely to hold for any embedded one. Embedded agents are subject to exogenous changes that they can’t model or affect; for instance, some actions may become available that the agent didn’t even consider possible when originally selecting a policy. This leads to natural preference drifts over time.
In order to maintain some sense of coherence or consistency, then, agents likely need some coordination mechanisms for future and counterfactual selves to cooperate around. This could be something like a set of virtues, an “identity” (such as an idealised version of the agent that all instances try to embody, not dissimilar to FDT or UDT) or some other kind of self-model. This means that any given version of the agent may be willing to suffer modification in service of a higher-level conception of itself that is common to its different instances.
Suppose indeed that such self-coordination mechanisms are a relevant component for even approximating some notion of reflective consistency or coherence. Then, the concept of corrigibility might not be that unnatural due to the affordances it would give an AI in achieving this end.
(to be clear, I’m advocating that corrigibility may not be that unnatural or fraught of a concept, not that Claude’s constitution makes meaningful progress towards corrigibility).

Ashe Vazquez Nuñez

How Go Play­ers Disem­power Them­selves to AI

Beliefs are Cho­sen to Serve Goals

A Tale of Two Rigours

Why nat­u­ral trans­for­ma­tions?

Agents Can Get Stuck in Self-dis­trust­ing Equilibria

An In­for­mal Defi­ni­tion of Goals for Embed­ded Agents

How Go Players Disempower Themselves to AI

Beliefs are Chosen to Serve Goals

Why natural transformations?

Agents Can Get Stuck in Self-distrusting Equilibria

An Informal Definition of Goals for Embedded Agents