Ashe Vazquez Nuñez

Karma: 900

Currently (May ’26) working on agent foundations as part of the MATS 9.1 extension program. I’m interested in self-models for embedded agents as a way to understand goals and beliefs. I otherwise occasionally write about math or its philosophy and sociology.

All writing is entirely my own unless explicitly stated otherwise

Ashe Vazquez Nuñez 4 May 2026 6:30 UTC
5 points
0
in reply to: Denis Weber’s comment on: How Go Players Disempower Themselves to AI
While the deep dive into Carlo Metta felt a bit like a personal crusade at times
I hesitated to publish this for exactly that reason. I was drawn to using this case as an example because I do actually think it affected the way AI use is perceived and handled in European/American Go culture for the worse. However, the topic is pretty sensitive among Go players and this makes it hard to discuss without eliciting monkey-politics-brain sentiments from everyone (inlcuding me as the writer). I ended up addressing the piece mostly to the AI crowd and chose not to widely publicise it among Go players.
I would be curious to know how I could brought up the example in a more tasteful way that wouldn’t have given the impression you describe.
On outsourcing of autonomy: I think there is a meaningful difference between the other examples you gave and Go AI. I agree that humans outsource their cognition to things all the time. I would call artefacts like my personal notes and my anki deck part of my extended mind. Extending minds is great, despite the perpetual risk of self-disempowerment it entails. However, most delegation used to happen between humans. AI has reached near-superhuman (or higher) level at many tasks that are a key part of how we share culture and resources (e.g. writing, code). This seems unusually dangerous because cultural, economic, and political power is slowly being transferred to increasingly intelligent entities that are unlikely to be aligned with human interests.

Ashe Vazquez Nuñez 4 May 2026 5:57 UTC
5 points
0
in reply to: Vladimir_Nesov’s comment on: How Go Players Disempower Themselves to AI
This is generally agreed upon to be the “right” way to study with AI and Go players often pay lip service to it. In practice people’s boundaries for what counts as on-policy distillation are not well-defined and that dillutes the impact. The mechanism whereby boundaries get weakened goes as follows:

A player finishes a game and usually has a narrative of what happened, which mistakes were pivotal, and what they need to improve. The AI’s evaluation will throw prediction error in the person’s model. Since the AI is understood to be unquestionably correct, the person often ‘has’ to update to the AI’s view. However, people can hack their their prediction-error sensors by retroactively updating what they think they believed in the past as well. This makes it extremely difficult to get useful feedback from the AI because all of it is ‘obvious’ or ‘natural’ after it is pointed out to you.
The above pattern seems extremely common and I personally struggle to overcome it. The best method I have found is to make a concrete, written narrative of my games. This includes recording the exact moves I think are mistakes, why they are mistakes, and how much I expect the AI to dislike them. This creates a more well-defined boundary of what your opinions actually were that helps you maintain some epistemic integrity. I would do this regularly if I were still playing Go (semi)-professionally.

Ashe Vazquez Nuñez 4 May 2026 5:33 UTC
4 points
0
in reply to: MC_Escherichia’s comment on: How Go Players Disempower Themselves to AI
I was indeed referring to Fox. I would have sworn there used to be a paid feature called “eagle-eye” (or something to that effect) that let you sneak a look at the AI evaluation of your live games. On the other hand, I can’t find it anymore. I’m wondering if I hallucinated/misunderstood it or if it was discontinued.

Ashe Vazquez Nuñez 2 May 2026 18:11 UTC
29 points
4
in reply to: LawrenceC’s comment on: How Go Players Disempower Themselves to AI
Go is an interesting model organism for disempowerment because its practice has both technical and artistic/cultural components. Go AI is indeed disanalogous to coding LLMs from a technical perspective: the Go AI has vastly superhuman competence and the LLM does not. However, as you pointed out, Leela zero and other engines don’t communicate off the board; moreover, their incredible strength actually makes them worse as game and review partners. This results in human-AI Go interactions that are usually hollow and vapid with respect to human-human ones, even if you factor out any cheating. AI is therefore bad at the cultural practice of Go and this shortcoming manifests in similar ways to those of old (maybe even current) coding assistants and LLMs as writers. People gravitate towards using the AI in all of these settings because it does a superficially “good enough” job at replacing them. However, in reality this offloading of cognition to the AI is illegitimate. In your example of coding assistants, the AI is not actually “good enough” on a technical level. At my school, the problem was that the valuable social and cultural exchange between players could not be replaced by their GPUs. Delegating one’s writing to LLMs suffers a bit from both of these problems.
As you pointed out, one antidote to this tendency of cognitive delegation involves being willing to languish in confusion: to continue to think even when it is uncomfortable or frustrating. Sufficiently responsible/thoughtful/robust people can thus benefit from AI usage, which is indeed likely easier with LLMs than with Go AI due to better communication. Relatedly, I think a path to an empowered society would involve system designs that counterbalance the “congitive offloading” tendency.
The patterns of disempowerment described above apply less strongly to professional players because they are selected to enjoy thinking about the game^[1]. That’s why I’m a little bit skeptical of how well your example of Chess players floundering straight out of prep applies^[2]. Chess players outsource much of their cognition to their memory (this was true before AI), which is a sensible competitive move. I suspect the occasional floundering comes from them dancing on the Pareto frontier between “amount of stuff memorised” and “ability to execute on stuff you memorised” which probably trade off against each other.
though it seems the case for chess play improving is stronger than for go, perhaps?
This is probably downstream of memorisation being relevant in Chess but not in Go.
Perhaps a more important question is, do you plan on writing more history of X posts?
Could you say more on what archetype of post you have in mind? I don’t think I can write a post quite like this one on many other topics because the narrative relies on lived experience. I am slowly cooking a “history of behavioural decision theory”, but that feels rather distinct.
1. ^
  Many AI users at my Go school were stronger (1 Dan to 4 Dan EGF) amateurs who were struggling to keep improving but who were still emotionally invested in watching their rank increase.
2. ^
  amateur Chess players were always disempowered with respect to their own prep anyway, though AI probably exacerbates this

Ashe Vazquez Nuñez 2 May 2026 17:47 UTC
18 points
2
on: How Go Players Disempower Themselves to AI
Short comment addressing some people remarkig similarities in my story to Chess culture:

I agree that Chess shows many similar patterns of disempowerment, but they seem to be better off along one relevant axis. Chess websites have automated algorithms for detecting cheating and are not scared to punish people based on their outputs. This significantly dampens AI use in online Chess. This contrasts with Go, where cheating is rarely punished and occasionally can even be enabled. One story I didn’t share in the main post is that the biggest Chinese Go server actually has a paid option to use AI in games inside the client. I have also seen students at Chinese Go schools being allowed to cheat during online practice games by teachers.

Ashe Vazquez Nuñez 22 Apr 2026 7:25 UTC
1 point
0
on: Ashe Vazquez Nuñez’s Shortform
LW react suggestion: “good paraphrase” (of something I said at a previous point in the thread)

Ashe Vazquez Nuñez 21 Apr 2026 21:19 UTC
1 point
0
in reply to: Ashe Vazquez Nuñez’s comment on: Beliefs are Chosen to Serve Goals
quick addendum: my point feels spiritually related to the idea that “convergent evolution” is an incomplete concept without a specification of the attractor basin.

Ashe Vazquez Nuñez 21 Apr 2026 21:16 UTC
1 point
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
I agree that FEP-shaped intuitions are very good for satisfice-ey agents. I’m unconvinced by the concrete mathematical modelling (notably not a fan of Bayesian generative models ) but I find the ideas conceptually useful if you abstract away the implementation.

My scepticism of general intelligence is closely related to your point that ASIs won’t infer every single law. Any given level of complexity in an organism can only acommodate a limited ontology. Of course, you can always “juice up” the agent and give it more resources so it learns a more textured world model. One pseudo-mathematical way to put this is that for every set of abstractions, there exists an abstraction that oblates all of them at once; for a fixed level of complexity however, there exist two sets of abstractions such that neither one clearly dominates.
Our crux might start at “some laws are convergently useful to infer”. One corollary of my last pseudo-mathematical claim is that any bounded agent has to “choose” between incomparable ontologies. The claim in my original post is that the revealed goals an agent is endowed with affects this choice. This amounts to advocating that a focus on the effect of selection pressures on learned abstractions will yield better predictions than a focus on finding “convergent” or “natural” abstractions.

Ashe Vazquez Nuñez 21 Apr 2026 20:54 UTC
1 point
0
in reply to: Samuel Ratnam’s comment on: Beliefs are Chosen to Serve Goals
why are the goals of other agents more likely to have natural convergent representations of them than other things in the world?

Ashe Vazquez Nuñez 20 Apr 2026 10:49 UTC
4 points
0
on: Ashe Vazquez Nuñez’s Shortform
Epistemic status: cold take that I found edifying to write up explicitly.
Abstraction allows you to efficiently talk about many things at once in exchange for it being harder to say things that are accurate. In category theory, for instance, one abstracts away the concept of elements of sets by only referring to “objects” that may or may not contain things. One then often ascends another step by defining morphisms between categories (functors) and then forgetting the categories to consider the functors themselves as objects of interest. Climbing these two layers of abstraction buys you an extraordinary amount of semantic efficiency but the resulting ontology is excruciating to reason about adequately.
I’ve been wondering why it is so much easier to climb and descend levels of abstraction using natural^[1] language instead of mathematical reasoning and would like to discuss two unoriginal theories that explain this. Firstly, one might conjecture that math is semantically limited by the restrictions imposed on it syntax; natural language is by contrast flexible which makes it meaningfully more expressive. As an example, consider how almost every fruitful approach to mathematising anything relies on the machinery of functions or relations. This limitation handicaps our ability to talk about embedded agents that lack well-defined i/o channels, and indeed we are currently stuck with a bunch of problematic mathematical frameworks (AIXI, decision/game theory, …).
A second explanation suggests that natural language allows seemingly frictionless transition across levels of abstraction through the abuse of underspecified syntax. In this view, natural language can only achieve the appearance of expressivity by obfuscating the lossiness of its communication. The logical ambiguity of natural language is broadly what has pushed mathematics culture towards heavier use of mathematical symbols, logical proof systems, and so on...
The above perspectives both have some merit and they seem to co-exist in a hermeneutic symbiosis. Natural language is used to generate an idea, then mathematical formalism is employed to sanity-check that idea and examine any implications that natural language may have failed to reveal. Unfortunately, this process can break down when a concept is either too semantically rich or syntactically abusive to be easily mathematisable. Moreover, it’s not always clear which of the two explanations apply (maybe both) when a framework proves hostile to being formalised. One meaningful axis along which researchers vary describes how much benefit of the doubt they are willing to give to these kinds of ambiguous theories.
1. ^
  no pun intended

Ashe Vazquez Nuñez 15 Apr 2026 17:30 UTC
2 points
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
On existence: I don’t see why agents should be seen as optimisers rather than as achieving some minimal conditions they are satisfied with. The second view seems more consistent both with actual human behaviour and with the concept of bounded rationality as a whole. The minimal conditions seem intuitively related to ensuring existence/propagation (e.g. drinking “enough” water, acquiring “enough” shelter, etc..), but I don’t have a more complete way to put it than that for the moment. I’ll check your rec out, thanks!
I’m also suspicious of god’s-eye-views. I think they can be conceptually clear and helpful on the one hand, but its unclear how much useful “realness” you trade for that clarity. I see them as training wheels that dominate in the interim as I seek a better, embedded perspective.
----
hawk vs. bat is exactly the type of example I had in mind. I don’t necessarily assume natural abstractions, and maybe relatedly I’m not sure if there’s a meaningful “general intelligence” knob that a process can choose to crank up — even one that nominally exists for exactly that purpose (like a project that tries to build an ASI). This could be cruxy? It might also relate to me not seeing why you’d expect two ASIs to develop similar emergent abstractions for their world-models in the following quote:
I’d expect two ASIs with different goals to have similar abstractions about the world, but different abstractions where those abstractions involve themselves/some level of recursive modeling, since those are how internal goals are represented; i.e. imo mutual info between them is low in this case.

Ashe Vazquez Nuñez 14 Apr 2026 17:45 UTC
2 points
0
in reply to: Ariel Cheng’s comment on: Beliefs are Chosen to Serve Goals
Thanks for your comments. I reacted to some parts where I straightforwardly agree and address some more nuanced points below.
I agree that my argument feels mesa-optimiser-shaped; I avoided the terminology because many “goals” I have in mind involve existence and not optimality conditions. But indeed, the relationship between the base and mesa optimisers is roughly analogous to the one I have in mind between “revealed” and “internal” goals^[1]. “fundamentally subservient” is therefore a linguistic overreach that I retrospectively don’t endorse. As you pointed out, the mesa optimiser is spun up to achieve the base optimiser’s goals, but the power structure between them once the mesa optimiser exists can be complex and rich; I’m broadly interested in studying precisely this complexity.
I find the point you make about the conflation technically correct but don’t yet know how I feel about the four-goal typology it induces. While writing, I conceptualised the external observer as part of a thought experiment in which non-embeddedness is allowed as a hermeneutic device (sorry that this wasn’t clear!). I think this can be a useful idea even if no possible external observer actually has this property, but it breaks down when the observer has itself to interact with the agent (i.e. in a game-theoretical setup). Your distinction could definitely be useful for such cases at least.
Finally:
Just because internal goals, along with an agent’s intelligence/capabilities, are selected for together in the sense that they exist in the same agent (and the selection process is over the “whole” agent), does not mean they won’t be independent. Or at least, I don’t see how this necessarily follows.
I’m pointing out something significantly stronger than that the two things exist in the same agent; they are selected “to”^[2] achieve the same revealed goal. The intelligence develops to fit the internal goals that are useful for the revealed goal, and the internal goals are in turn limited by the shape of the intelligence. I think these limitations are much more informationally rich than can be chalked up to “just” being about compute/complexity bounds, though I failed to illustrate this in my original piece. Let’s say you’re a selction process designing an agent and choose betweeen “giving it” an architecture that is likely to encode abstraction A versus abstraction B^[3]. Your choice between A and B might depend on questions like what internal model and goals are fit for the agent’s environment. This means there’s a dependency between the way in which the agent will be intelligent (i.e. whether it learns abstraction A or B) and what its internal goals will be. Information about one gives information about the other and vice-versa.
1. ^
  I have occasionally used the term “mesa-satisficer” in the past.
2. ^
  asterrisk for base/mesa distinction again!
3. ^
  You could, at a higher complexity cost, endow it with an abstraction C that contains the ability to understand both A and B (which are themselves mutually incomparable), but this might not be the best way to allocate your resources.

Ashe Vazquez Nuñez 25 Mar 2026 21:39 UTC
3 points
0
in reply to: Jonas Hallgren’s comment on: Agents Can Get Stuck in Self-distrusting Equilibria
My aim is to work towards a paradigm where allying under a coherent identity is a demonstrably “reasonable” development for sub-agents or TIs subject to selection pressures. I agree that one of these pressures is likely predictability or, almost equivalently, a compressable and simple self-model.
I hint at what this could look like in the section “Robust equilibria and updatelessness”. The shape of identity (e.g. how updateless the agent allows itself to be) may dictate not only how TIs interact with each other, but also how they will navigate strategic interactions with TIs belonging to other agents. This would ideally mean that a theory of intrapersonal games in an agent determines what solution concepts are valid for that agent in usual game theory.
This addresses your point in the specific case where the rest of the world consists of other agents; I’m not as clear on the relationship intrapersonal games have to “world models”. Food for thought, thanks.

Ashe Vazquez Nuñez 22 Mar 2026 5:36 UTC
1 point
0
on: Terrified Comments on Corrigibility in Claude’s Constitution
Unnatural, because we expect the AI to resist having its mind changed: rational agents should want to preserve their current preferences, because letting their preferences be modified would result in their current preferences being less fulfilled
This is plausible for some idealistic rational agent, but seems unlikely to hold for any embedded one. Embedded agents are subject to exogenous changes that they can’t model or affect; for instance, some actions may become available that the agent didn’t even consider possible when originally selecting a policy. This leads to natural preference drifts over time.
In order to maintain some sense of coherence or consistency, then, agents likely need some coordination mechanisms for future and counterfactual selves to cooperate around. This could be something like a set of virtues, an “identity” (such as an idealised version of the agent that all instances try to embody, not dissimilar to FDT or UDT) or some other kind of self-model. This means that any given version of the agent may be willing to suffer modification in service of a higher-level conception of itself that is common to its different instances.
Suppose indeed that such self-coordination mechanisms are a relevant component for even approximating some notion of reflective consistency or coherence. Then, the concept of corrigibility might not be that unnatural due to the affordances it would give an AI in achieving this end.
(to be clear, I’m advocating that corrigibility may not be that unnatural or fraught of a concept, not that Claude’s constitution makes meaningful progress towards corrigibility).

Ashe Vazquez Nuñez 26 Feb 2026 1:06 UTC
1 point
0
on: Ashe Vazquez Nuñez’s Shortform
Writing applied math: “I should take care to adequately explain all the intuitions behind my mathematical objects so the reader can see whether my assumptions are justified, whether the math reflects the intuitions, etc...”

Reading applied math: “why do the authors spend three million pages on arbitrarily specific, cherrypicked intuitive examples instead of just showing me real math I can parse?”

Ashe Vazquez Nuñez 12 Feb 2026 23:36 UTC
2 points
0
on: Every Measurement Has a Scale
Nice post!

When a property is discussed in any real analysis-shaped context, its relevance is usually implicitly understood by mathematicians to be in the limit (i.e. for very large numbers or at values very close to some fixed point). For example, the statement “f(x) is a polynomial of degree 26” means that there exists some large enough R>0 such that for every scale L larger than R, the L-degree of f (in the sense of the definition you gave) is arbitrarily close to 26.
This type of “asymptotic” analysis has its limits in all kinds of applied fields (galactic algorithms are an example of this), because we do tend to use math at specific, finite scales : )

Ashe Vazquez Nuñez 9 Feb 2026 0:34 UTC
3 points
−1
on: Ashe Vazquez Nuñez’s Shortform
Here are two different ways you can engage with a stock/prediction market:
1. You form your own opinion and aggregate it to the market. This requires forming an inside view.
2. You observe other people’s opinions and notice an opportunity for arbitrage. This requires you to be good at aggregating and noticing the implications of other people’s beliefs, such that you can make risk-free profit off of the market’s lack of logical omniscience.
The following is guess/speculation:
These methods live on opposite ends of a continuous scale. The scale indicates how “inside-ey” or self-generated your view is.
Predictions that are closer to method 2 are how you get good returns, whereas those that lean closer to method 1 serve to calibrate your own opinion, potentially at the expense of your money.
This suggests that I was majorly confused about the role that competitively successful forecasters and/or performers on markets serve for sense-making. My guess is now that top forecasters have shown evidence that they excel at arbitraging other people’s opinions, but not much evidence of their ability to form inside views. I previously thought they showed proof of both qualities.
This is a distinction that I haven’t seen explicitly made yet. Is there something I’m missing? has someone else already thought of this? let me know :)

Ashe Vazquez Nuñez 8 Feb 2026 22:00 UTC
1 point
0
in reply to: silentbob’s comment on: silentbob’s Shortform
Does “multi-modality” include features like having a physical world model, such that it could input sensible commands to robot body, for instance?

Ashe Vazquez Nuñez 3 Feb 2026 21:16 UTC
3 points
0
in reply to: Mitchell_Porter’s comment on: Ashe Vazquez Nuñez’s Shortform
more conscious: deciding what move to make in a chess game.
less conscious: The physical act of playing a move. You can move the piece in a conscious, deliberate way, but in practice the movement usually follows “automatically” from the high-level decision of what move to play.
not conscious: reflex reactions.

Ashe Vazquez Nuñez 3 Feb 2026 5:46 UTC
1 point
0
on: Ashe Vazquez Nuñez’s Shortform
The nature of doing interdisciplinary research is that you have to know a little about a lot of things. Unfortunately, it’s hard to tell whether you know a little about something or are just misinformed about it.

Much of my agent-ey thinking is inspired by and seeks to adequately model human cognition, but I realised I have no solid understanding of the relationship consciousness has to cognition. They’re definitely not the same, since most processes I could describe as cognitive don’t materialise in my consciousness. However, almost all the examples I find insightful feature conscious decision-making. This suggests that what cognition is without consciouness is opaque to me.