‘Theories of Values’ and ‘Theories of Agents’: confusions, musings and desiderata

Meta:

  • Content signposts: we talk about limits to expected utility theory; what values are (and ways in which we’re confused about what values are); the need for a “generative”/​developmental logic of agents (and their values); types of constraints on the “shape” of agents; relationships to FEP/​active inference; and (ir)rational/​(il)legitimate value change.

  • Context: we’re basically just chatting about topics of mutual interests, so the conversation is relatively free-wheeling and includes a decent amount of “creative speculation”.

  • Epistemic status: involves a bunch of “creative speculation” that we don’t think is true at face value and which may or may not turn out to be useful for making progress on deconfusing our understanding of the respective territory.


Mateusz Bagiński

Expected utility theory (stated in terms of the VNM axioms or something equivalent) thinks of rational agents as composed of two “parts”, i.e., beliefs and preferences. Beliefs are expressed in terms of probabilities that are being updated in the process of learning (e.g., Bayesian updating). Preferences can be expressed as an ordering over alternative states of the world or outcomes or something similar. If we assume an agent’s set of preferences to satisfy the four VNM axioms (or some equivalent desiderata), then those preferences can be expressed with some real-valued utility function and the agent will behave as if they were maximizing that .

On this account, beliefs change in response to evidence, whereas values/​preferences in most cases don’t. Rational behavior comes down to (behaving as if one is) ~maximizing one’s preference satisfaction/​expected utility. Most changes to one’s preferences are detrimental to their satisfaction, so rational agents should want to keep their preferences unchanged (i.e., utility function preservation is an instrumentally convergent goal).

Thus, for a preference modification to be rational, it would have to result in higher expected utility than leaving the preferences unchanged. My impression is that the most often discussed setup where this is the case involves interactions between two or more agents. For example, if you and and some other agent have somewhat conflicting preferences, you may go on a compromise where each one of you makes them preferences somewhat more similar to the preferences of the other. This costs both of you a bit of (expected subjective) utility, but less than you would lose (in expectation) if you engaged in destructive conflict.

Another scenario justifying modification of one’s preferences is when you realize the world is different than you expected on your priors, such that you need to abandon the old ontology and/​or readjust it. If your preferences were defined in terms of (or strongly entangled with) concepts from the previous ontology, then you will also need to refactor your preferences.


You think that this is a confused way to think about rationality. For example, you see self-induced/​voluntary value change as something that in some cases is legitimate/​rational.

I’d like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we’re done with that, we can talk more generally about arguments for why the values of an agent/​system should not be fixed.

Sounds good?

Mateusz Bagiński

On a meta note: I’ve been using the words “preference” and “value” more or less interchangeably, without giving much thought to it. Do you view them as interchangeable or would you rather first make some conceptual/​terminological clarification?

Nora_Ammann

Sounds great!

(And I’m happy to use “preferences” and “values” interchangeably for now; we might at some point run into problems with this, but we can figure that out when we get there.)

Where to start...?

First, do I think the first part of your intro is “a confused way to think about rationality”? Sort of, but it’s a bit tricky to get our language to allow us to make precise statements here. I’m perfectly happy to say that under certain notions of rationality, your description is right/​makes sense. But I definitely don’t think it’s a particularly useful/​relevant one for the purposes I’m interested in. There is a few different aspects to this:

  • First, EUT makes/​relies on idealizing assumptions that fall short when trying to reason about real-world agents that are, e.g. bounded, embedded, enactive, nested.

    • (Note that there is a bunch of important nuance here, IMO. While I do think it’s correct and important to remind ourselves that we are interested in real-world/​realized agents (not ideal ones), I also believe that “rationality” puts important constraints on the space of minds.)

  • Second, I would claim that EUT only really gives me a “static picture” (for lack of a better word) of agents rather than a “generative” one, one that captures the “logic of functioning” of the (actual/​realized) agent? Another way of saying this: I am interested in understanding how values (and beliefs) are implemented in real-world agents. EUT is not the sort of theory that even tries to answer this question.

    • In fact, one of my pet peeve here is something like: So, ok, EUT isn’t even trying to give you a story about how an agent’s practical reasoning is implemented. However, it sometimes (by mistake) ended up being used to serve this function and the result of this is that these “objects” that EUT uses - preferences/​values—have become reified into an object with ~ontological status. Now, it feels like you can “explain” an agent’s practical reasoning by saying “the agent did X because they have an X-shaped value”. But like.. somewhere along the way we forgot that we actually only got to back out “preferences/​values” form first observing he agents actions/​choices—and now we’re evoking them as an explanation for those actions/​choices. I think it makes people end up having this notion that there are these things called values that somehow/​sort of exist and must have certain properties—but personally, I think we’re actually more confused than even being able to posit that there is this singular explanandum (“values”) that we need to understand in order to understand an agent’s practical reasoning.

Ok… maybe I leave it there for now? I haven’t really gotten to your two leading questions yet (though maybe started gesturing at some pieces of the bigger picture that I think are relevant), so happy for you to just check whether you want to clarify or follow up on something I’ve said so far and otherwise ask me to address those two questions directly.

Mateusz Bagiński

While we’re at it, I have some thoughts and would be curious to hear your counterthoughts.

So your points are (1) the idealizing assumptions of EUT don’t apply to real-world agents and (2) EUT gives only a static/​snapshot picture of an agent. Both seem to have parallels in the context of Bayesian epistemology (probably formalized epistemology more broadly but I’m most familiar with the Bayesian kind).

I’ll focus on (1) for now. Bayesian epistemology thinks of rational reasoners/​agents as logically omniscient, with perfectly coherent probabilistic beliefs (e.g., no contradictions, probabilities of disjoint events sum up to 1), updating on observations consistently with the ratio formula and so on. This obviously raises the question about to what extent this formalism is applicable/​helpful for guiding real-world logic of forming and updating beliefs. Standard responses seem to fall along the lines of (SEP):

(a) Even though unattainable, idealized Bayesian epistemology is a useful ideal to aspire towards. Keeping our sight on the ideal reminds us that “the math exists, even though we can’t do the math precisely”. This can guide us in our imperfect attempts to refine our reasoning so that it approximates that ideal as much as possible (or rather, as much as profitable on the margin because there obviously are diminishing returns to investing in better cognition).

(b) Idealized Bayesian epistemology is akin to a spherical cow in the vacuum or an ideal gas. It’s a formalism meant to capture the commonalities of many real-world phenomena with a varying degree of inaccuracy. The reason for its partial success is probably that they share some common abstract property that arises in each case via a sufficiently similar process. This ideal can then be de-idealized by adding some additional constraints, details, and specifications, that make it closer to how some specific real-world system (or a class of systems) functions.

(Perhaps related to the distinction between counting-down coherence and counting-up coherence?)

It seems to me that analogous responses could be given to allegations of EUT being a theory of idealized agents that are unbounded, unembedded and so on. Maybe EUT is an unattainable ideal but is nevertheless useful as an ideal to aspire towards? And/​or maybe it can be used as a platonic template to be filled out with real-world contingencies of cognitive boundedness, value/​preference-generating processes and so on? What do you think of that?

You mentioned that “under certain conditions/​notions of rationality [EUT prescriptions] make sense”. Does that mean you view EUT as a special (perhaps very narrow and unrealistic in practice) case of some broader theory of rationality of which we currently have an incomplete grasp?

Regarding (2), the problem of lack of specification of how values arise in a system seems similar to the problem of priors, i.e., how should an agent assign their initial credences on propositions on which they lack evidence (SEP). Maybe the very way this question/​problem is formulated seems to presume an idealized (form of an) agent that gets embedded in the world, rather than something that arises from the world via some continuous process, adapting, gaining autonomy, knowledge, competence, intelligence, etc.


Let me rephrase your pet peeve to check my understanding of it.

When observing an agent doing certain things, we’re trying to infer their preferences/​values/​utility function from their behaviour (plus maybe our knowledge of their cognitive boundedness and so on). These are just useful abstractions to conceptualize and predict their behaviour and are not meant to correspond to any reality-at-joints-carving thing in their brain/​mind. In particular, it abstracts away from implementional details. But then preferences/​values/​utility function are used as if they correspond to such a thing and the agent is assumed to be oriented towards maximizing their utility function or satisfaction/​fulfillment of their values/​preferences?

Nora_Ammann

Maybe the very way this question/​problem is formulated seems to presume an idealized (form of an) agent that gets embedded in the world, rather than something that arises from the world via some continuous process, adapting, gaining autonomy, knowledge, competence, intelligence, etc.


Yes, this is definitely one of the puzzle pieces that I care a lot about. But I also want to emphasize that there is a weaker interpretation of this critique and a stronger one, and really I am most interested in the stronger one.

The weak version is roughly: there is this “growing up” period during which EUT does not apply, but once the agent has grown up to be a “proper” agent, EUT is an adequate theory.

The stronger version is: EUT is inadequate as a theory of agents (for the same reasons, and in the same ways) during an agent’s “growing up” period as well as all the time.

I think the latter is the case for several reasons, for example:

  • agents get exposed to novel “ontological entities” continuously (that e.g. they haven’t yet formed evaluative stances with respect to), and not just while “growing up”

  • there is a (generative) logic that governs how an agent “grows up” (develops into a “proper agent”), and that same logic continues to apply throughout an agent’s lifespan

......

Now, the tricky bit—and maybe the real interesting/​meaty bit to figure out how to combine—is that, at some point in evolutionary history, our agent has accessed (what I like to call) the space of reason. In other words, our agent “goes computation”. And now I think an odd thing happens: while earlier our agent was shaped and constraint by “material causes” (and natural selection), now our agent is additionally also shaped and constraint by “rational cause/​causes of reason”.* These latter types of constraints are the ones formal epistemology (including EUT etc.) is very familiar with, e.g. constraints from rational coherence etc. And I think it is correct (and interesting and curious) that these constraints come to have significant affect on our agent, in a sort of retro-causal way. It’s the (again sort of) downward causal ‘force’ of abstraction.

(* I “secretly” think there is a third type of constraint we need to understand in order to understand agent foundations properly, but this one is weirder and I haven’t quite figured out how to talk about it best, so I will skip this for now.)

Mateusz Bagiński

Seems like we’ve converged on exactly the thing that interests me the most. Let’s focus on this strong EUT-insufficiency thesis

agents get exposed to novel “ontological entities” continuously (that e.g. they haven’t yet formed evaluative stances with respect to), and not just while “growing up”

This seems to imply that (at least within our universe) the agent can never become “ontologically mature”, i.e., regardless of how much and for how long it has “grown”, it will continue experiencing something like ontological crises or perhaps their smaller “siblings”, like belief updates that are bound to influence the agent’s desires by acting on its “normative part”, rather than merely on the “epistemic part”.

I suspect the latter case is related to your second point

there is a (generative) logic that governs how an agent “grows up” (develops into a “proper agents”), and that same logic continues to apply throughout an agent’s lifespan

Do you have some more fleshed-out (even if very rough/​provisional) thoughts on what constitutes this logic and the space of reason? Reminds me of the “cosmopolitan leviathan” model of the mind Tsvi considers in this essay and I wonder whether your proto-model has a roughly similar structure.

Nora_Ammann

Ok, neat! So.. first a few clarifying notes (or maybe nitpicks):

1)

regardless of how much and for how long it has “grown”, it will continue experiencing something like ontological crises or perhaps their smaller “siblings”

So I think this is true in principle, but seems worth flagging that this will not always be true in practice. In other words, we can imagine concrete agents which have reached at some point an ontology that they will no further change until their death. This is not because they have reached the “right” or “complete” ontology with respect to the world, but simply a sufficient one with respect to what they have or will encounter.

A few things that follow from this I want to highlight:

  • As such, the question whether or not, or how frequently, a given agent is yet to experience ontological crises (or their smaller siblings) is an empirical question. E.g. a human past the age of 25/​50/​75 (etc.), how many more ontological crises are they likely to experience before they die? Does the frequency of ontological crises experience differ between humans who lived in the 18th century and humans living in the 21st century? etc.

  • Depending on what we think the empirical answer to the above question is, we might conclude that actual agents are surprisingly robust to/​able to handle ontological crises (and we could then investigate why/​how that is?), or we might conclude that even if in principle possible, ontological crises are rare, which again would suggest something about the fundamental nature/​functioning of agents and open the further avenues of investigation.

  • I think answering the empirical question for good is pretty hard (due to some lack/​difficulty of epistemic access), but from what we can observe, my bet currently is on ontological crises (or their smaller siblings) being pretty frequent, and that thus an adequate theory of agents (or values) needs to acknowledge this openendedness as fundamental to what it means to be an agent, rather than a “special case”.

  • That said, if we think ontological crises are quite often demanded from the agent, this does raise a question about whether we should expect to see agents doing a lot of work in order to avoid having to be forced to make ontological updates, and if so what that would look like, or whether we already see that. (I suspect we can apply a similar reasoning to this as comes out of the “black room problem” in active inference, where the answer to why minimizing expected free energy does not lead to agents hiding in black rooms (thereby minimizing surprise), involves recognizing the trade of between accuracy and complexity.)

Nora_Ammann

2)

perhaps their smaller “siblings”

I like this investigation! I am not sure/​haven’t thought much about what the smaller sibling might be (or whether we really need it), but I seem to have a similar experience to you in that saying “ontological crises” seems sometimes right in type but bigger than what I suspect is going on.

[Insert from Mateusz: I later realized that the thing we’re talking about is concept extrapolation/​model splintering.]

Nora_Ammann

3)

like belief updates that are bound to influence the agent’s desires by acting on its “normative part”, rather than merely on the “epistemic part”.

My guess (including form other conversations we had) is that here is a place where our background models slightly disagree (but I might be wrong/​am not actually entirely confident in the details of what your model here is). What I’m hearing when I read this is still some type difference/​dualism between belief and value updates—and I think my models suggest a more radical version of the idea that “values and beliefs are the same type”. As such, I think every belief update is a value update, though it can be small enough to not “show” in the agent’s practical reasoning/​behavior (similar to how belief updates may not immediately/​always translate into choosing different actions).

Nora_Ammann

Ok, now to the generative logic bit!

Ah gosh, mostly I don’t know. (And I haven’t read Tsvi’s piece yet, but appreciate the pointer and will try to look at it soon, and maybe comment about its relationship to my current guesses later) But here are some pieces that I’m musing over. I think my main lens/​methodology here is to be looking for what constraints act on agents/​the generative logic of agents:

1. “Thinghood” /​ constraints from thinghood

  • It seems to me like one piece, maybe the “first” piece, is what I have logged under the notion of “thinghood” (which I inherit here from the Free Energy Principle (FEP)/​Active Inference). Initially it sounds like a “mere” tautology, but I have increasingly come to see that the notion of thinghood is able to do a bunch of useful work. I am not sure I will be able to point very clearly at what I think is the core move here that’s interesting, but let’s try with just a handful of words/​pointers/​gesturing:

    • FEP says, roughly, what it means to be a thing is to minimize expected free energy. It’s a bit like saying “in order to be a thing, you have to minimize expected free energy” but that’s not quite right, and instead it’s closer to “in virtue of being a thing/​once you are a thing, this means you must be minimizing expected free energy”.

      • “Minimizing expected free energy” is a more compressed (and fancy) way to say that the “thing” comes to track properties of the environment (or “systems” which is the term used in Active Inference literature) to which they are (sparsely) coupled.

      • “thing” here is meant in a slightly specific way; I think what we want to do here is describing what it means to be a thing similar to how we might want to describe what it means to be “life”—but where “thing” here picks out a slightly larger concept than “life”

    • “In virtue of being a thing” can be used as a basis for inference. In other words, “thinghood” is the “first” place where my hypothesis space for interpreting my sensory data starts to be constrained in some ways. (Not very much yet, but the idea is the first, most fundamental constraint/​layer.)

    • So roughly the upshot of this initial speculative investigation here is something like: whatever the generative logic, it has to be within/​comply with the constraints of what it means to exist as a thing (over time).

    • [To read more about it, I would point to this paper on Bayesian mechanics, or this Paper on Path integrals, particular kinds and strange things]

(for a reason that might only come clear later on, I am playing with calling this “constraints from unnatural selection”.)

Nora_Ammann

2) Natural selection /​ constraints from natural selection

  • Here, we want to ask: what constraints apply to something in virtue of being subject to the forces of natural selection?

  • We’re overall pretty familiar with this one. We know a bunch about how natural evolution works, and roughly what features something has to have in order for (paradigmatic) Darwinian evolution to apply (heredity, sources of variation; see e.g. Godfrey-Smith’s Darwinian Populations).

  • I think the study of history/​historic path dependencies also goes here.

3) Rationality/​Reason /​ constraints from rationality/​reason

  • Finally, and this is picking up on a bunch of things that came up above already, once something enters the realm of the computational/​realm of reason, further constraints come to act on it—constraints from reason/​rationality.

  • Here is where a bunch of classical rationality/​decision theory etc comes in, and forcefully so, at times, but importantly this perspective also allows us to see that it’s not the only or primary set of constraints.

Nora_Ammann

Ok, this was a whole bunch of something. Let me finish with just a few more loose thoughts/​spitballing related to the above:

  • Part of me wants to say “telos”/​purpose comes in at the rational; but another part thinks it already comes in at the level of natural selection. Roughly, my overall take is something like: there exists free floating “reasons”, and the mechanism of natural selection is able to discover and “pick up on” those “reasons”. In the case of natural selection, the selection happens in the external/​material, while with rational selection, albeit at the core the same mechanism, the selection happens in the internal/​cognitive/​hypothetical. As such, we might want to say that natural selection does already give us a ~weak notion of telos/​purpose, and that rational selection gives us more of a stronger version; the one we more typically mean to point at with the terms telos/​purpose. (Also see: On the nature of purpose)

  • Part of me wants to say that constraints from thinghood is related to (at least some versions of) anthropic reasoning.

  • I think learning is good in virtue of our embeddedness/​historicity/​natural selection. I think updatelessness is good in virtue of rational constraints.

Nora_Ammann

[Epistemic status: very speculative/​loosely held. Roughly half of it is poetry (for now).]

Mateusz Bagiński

My guess (including form other conversations we had) is that here is a place where our background models slightly disagree

I also think that they are two aspects of the same kind of thing. It’s just me slipping back into old ways of thinking about this.

EDIT: I think though that there is something meaningful I was trying to say and stating it in a less confused/​dualistic way would be something like one of these two:

(1) The agent acquires new understanding which makes them “rethink”/​reflect on their values in virtue of these values themselves (FWIW) rather than their new state of belief implying that certain desirable-seeming things are out of reach, actions that seemed promising, now seem hopelessly “low utility” or something.

or

(2) Even if we acknowledge that beliefs are values are fundamentally (two aspects of) the same kind, I bet there is still a meaningful way to talk about beliefs and values on some level of coarse-graining or for certain purposes. Then, I’m thinking about something like:

An update that changes both the belief-aspect of the belief-value-thing and its value-aspect, but the value-aspect-update is of greater magnitude (in some measure) from the belief-aspect-update in a way that is not downstream from the belief-aspect-update, but rather both are locally independently downstream from the same new observation (or whatever triggered the update).

Nora_Ammann

(Noticed there is a fairly different angle/​level at which the questions about the generative logic could be addressed too. At that level, we’d for example want to more concretely talk about the “epistemic dimension” of values & the “normative or axiological” dimensions of beliefs. Flagging in case you are interested to go down that road instead. For example, we could start by listing some things we have noticed/​observed about the epistemic dimension of values and vice versa, and then after looking at a number of examples zoom out and check whether there are more general things to be said about this.)

Mateusz Bagiński

In case you missed, Tsvi has a post (AFAICT) exactly about thinghood/​thingness.

Can what you wrote be summarized that “being a free energy-minimizing system” and “thinghood” should be definitionally equivalent?

“In virtue of being a thing” can be used as a basis for inference. In other words, “thinghood” is the “first” place where my hypothesis space for interpreting my sensory data starts to be constrainedconstraint in some ways.

Does it mean that in order to infer anything from some input, that input must be parseable (/​thinkable-of) in terms of things? (maybe not necessarily things it represents/​refers-to[whatever that means]/​is-caused-by but spark some associations with a thing in the observer)

Or do you mean that one needs to “be a thing” in order to do any kind of inference?

Is it fair to summarize this as “thinghood”/​”unnatural selection” is a necessary prerequisite for natural selection/​Darwinian evolution? This reminds me of PGS’s insistence on discrete individuals with clear-ish parent-offspring relationships (operationalized in terms of inherited “fitness”-relevant variance or something) to be a sine qua non of natural selection (and what distinguishes biological evolution from e.g., cultural “evolution”). It felt intuitive to me but I don’t think he gave specific reasons for why that must be the case.

I think you could say that natural selection has been a prerequisite for agents capable of being constrained by the space of reason. This has been true of humans (to some extent other animals). Not sure about autonomous/​agenty AIs (once[/​if] they arise), since if they develop in a way that is a straightforward extrapolation of the current trends, then (at least from PGS’s/​DPNS perspective) they would qualify as at best marginal cases of Darwinian evolution (for the same reasons he doesn’t see most memes as paradigmatic evolutionary entities and at some point they will likely become capable of steering their own trajectory not-quite-evolution).

Noticed there is an fairly different angle/​level at which the questions about the generative logic could be addressed too

I think the current thread is interesting enough

Nora_Ammann

(quick remark on your edit re the (non-)dualistic way of talking about values/​beliefs—here is a guess for where some of the difficulty to talk about comes from:

We typically think/​talk about values and beliefs as if they were objects, and then we think/​talk about what properties these object have.

How I think we should instead think about this: there is some structure to an agent*, and that structure unravels into “actions” when it comes into contact with the environment. As such “beliefs” and “values” are actually just “expressions” of the relation between the agent’s “structure/​morphology” and the environment’s “structure/​morphology”.

Based on this “relational” picture, we can then refer to the “directionality of fit” picture to understand what it means for this “relation” to be more or less belief/​value like—namely depending on what the expressed direction of fit is between agent and world.

(*I think we’d typically say that the relevant structure is located in the agent’s “mind”—I think this is right insofar as we use a broad notion of mind, acknowledging the role of the “body”/​the agent’s physical makeup/​manifestation.)

Nora_Ammann

---

Does it mean that in order to infer anything from some input, that input must be parseable (/​thinkable-of) in terms of things? (maybe not necessarily things it represents/​refers-to[whatever that means]/​is-caused-by but spark some associations with a thing in the observer)

Or do you mean that one needs to “be a thing” in order to do any kind of inference?

More like the latter. But more precisely: assume you just have some sensory input (pre any filter/​ontology you have specific reason to trust that would help you organize/​make sense of that sensory input). There is a question how you could, from this place, make any valid inference. What I’m trying to say with the “thinghood” constraint is that, the fact that you’re experiencing any sensory input at all implies you must have some “sustained existence” - you must endure for more than just a single moment. In other words, you must be a “thing” (according to the minimal definition form above/​from FEP). But that fact allows you to back out something—it becomes your “initial ground to stand on” from which you can “bootstrap” up. It’s a bit like Descarte’s “I think therefor I am”—but more like “I am [a thing], therefor… a certain relationship must hold between different bits of sensory input I am receiving (in terms of their spatial and temporal relationship) -- and this new forms the ground from which I am able to do my first bits of inference.

Nora_Ammann

---

Is it fair to summarize this as “thinghood”/​”unnatural selection” is a necessary prerequisite for natural selection/​Darwinian evolution?

Depends on what sort of “prerequisit” you mean. Yes in physical/​material time (yes, you need something that can be selected). (FWIW I think what is true for darwinian evolutin is also true for “history” more generally—once you have material substrate, you enter the realm of history (of which darwinian evolution is a specific sub-type). This is similar to how (in tihs wild/​funny picture I have been painting here) “once you have computational substrate, you enter the realm of rationality/​rational constraints start to have a hold on you.)

There is another sense in which I would not want to say that there is any particular hierarchy between natural/​unnatural/​rational constraints.

Nora_Ammann

I like Godfrey-Smith’s picture here useful in that it reminds us that we are able to say both a) what the paradigmatic (“pure”) case looks like, and also that b) most (/​all?) actual example will not match the fully paradigmatic case (and yet be shaped to different extends by the logic that is illustrated in the paradigmatic case). So in our picture here, an actual agent will likely be shaped by rational/​natural/​unnatural constraints, but my none of them in a maximalist/​pure sense.

Mateusz Bagiński

assume you just have some sensory input (pre any filter/​ontology you have specific reason to trust that would help you organize/​make sense of that sensory input). There is a question how you could, from this place, make any valid inference. What I’m trying to say with the “thinghood” constraint is that, the fact that you’re experiencing any sensory input at all implies you must have some “sustained existence” - you must endure for more than just a single moment. In other words, you must be a “thing” [...]. But that fact allows you to back out something—it becomes your “initial ground to stand on” from which you can “bootstrap” up.

Hm, this kind of proto-cognitive inference ([I get any input] → [I am a stable “thing”] → [I stand in a specific kind of relationship to the rest of the world]) feels a bit too cerebral to expect from a very simple… thing that only recently acquired stable thinghood.

The way I see it is:

A proto-thing* implements some simple algorithm that makes it persist and/​or produce more of itself (which can also be viewed as a form of persistence). Thinghood is just the necessary foundation that makes any kind of adaptive process possible. I don’t see the need to invoke ontologies at this point. I haven’t thought about it much but the concept of ontology feels to me like implying a somewhat high-ish level of complexity and while you can appropriate ontology-talk for very simple systems, it’s not very useful and adds more noise than clarity to the description.

---
* By “proto-thing”, I mean “a thing that did not evolve from other things but rather arose ~spontaneously from whatever”. I suspect there is some degree of continuity-with-phase-transitions in thinghood but let’s put that aside for now.

Nora_Ammann

While I agree it sounds cerebral when we talk about it, I don’t think it has to be—I think there is some not-unfounded hope that FEP is mathematizing exactly that: thinghood implies that and puts some constraints on how “the internal states (or the trajectories of internal states) of a particular system encode the parameters of beliefs about external states (or their trajectories).”

Also, it’s worth reminding ourselves: it’s not really MUCH we’re getting here—like the FEP literature sounds often quite fancy and complex, but the math alone (even if correct!) doesn’t constrain the world very much.

(I think the comparison to evolutionary theory here is useful, which I believe I have talked to you about in person before: we generally agree that evolutionary theory is ~true. At the same time, evolutionary theory on its own is just not very informative/​constraining on my predictions if you ask me “given evo theory, what will the offspring of this mouse here in front of us look like”. It’s not that evo theory is wrong, it just on its own that much to say about this question.)

Mateusz Bagiński

While I agree it sounds cerebral when we talk about it, I don’t think it has to be—I think there is some not-unfounded hope that FEP is mathematizing exactly that: thinghood implies that and puts some constraints on how “the internal states (or the trajectories of internal states) of a particular system encode the parameters of beliefs about external states (or their trajectories).”

Hmm, IDK, maybe. I’ll think about it.

Mateusz Bagiński

Moving back to the beginning of the dialogue, the kick-off questions were:

I’d like to elicit some of your thoughts about value change in humans. What makes a specific case of value change (il)legitimate? How is that tied to the concepts of rationality, agency, etc? Once we’re done with that, we can talk more generally about arguments for why the values of an agent/​system should not be fixed.

The topics we’ve covered so far give some background/​context but don’t answer these questions. Can you elaborate on how you see them relate to value change (il)legitimacy, and value-malleable rationality?

Nora_Ammann

Some thoughts/​intuitions/​generators: [though note that I think a lot of this is rehashing in a condensed way arguments I make in the value change problem sequence]

  • Why values should not be fixed?

    • It seems pretty obvious to me that humans (and other intelligent agents) aren’t born with a “complete ontology”—instead, their “ontology” needs to grow and adapt as they learn more about the world and e.g. run into “new” ontological entities they need to make sense of.

    • At least two important things follow from that according to me:

    • (1) Understanding ontological open-endedness is a necessary part of understanding real agents. Rather than some sort of edge case or tail event, it seems like ontological open-endedness/​development is a fundamental part of how real agents are built/​function. I want agent foundations work that takes this observation seriously.

    • (2) My current best guess on values is the non-dualist picture discussed above, such that values are inherently tied to agent’s beliefs/​world models, and thus the open-endedness “problem” pointed out above has also direct/​fundamental ramifications on the agent’s values/​the functional logic of “how values work in (real) agents”.

      • In other words, I think that accounts of values that model them as fixed by default are in some relevant sense misguided in that they don’t take the openendedness point seriously. This also counts for many familiar moral/​ethical theories. In short, I think the problem of value change is fundamental not peripheral, and that models/​theories/​accounts that feel like they don’t grapple with value malleability as a matter of fact mis-conceptualized the very basic properties of what values are in such a way that I have a hard time having much confidence in what such models/​theories/​accounts output.

    • NB I took your question to roughly mean “why values should not be modelled as fixed?”. A different way to interpret this questions would be: if I as an agent have the choice between fixing my values and letting them be malleable, which one should I choose and how?”. My answer to the latter question, in short, is that, as a real agent, you simply don’t get the choice. As such, the question is mute.

      • (There is a follow up question whether sufficiently powerful AI agents could approach the “idealized” agent model sufficiently well such that they do in fact practically speaking get to have a choice over whether their values should be fixed or malleable, and from there a bunch of arguments form decision theory/​rational choice theory suggesting that agents will converge to keeping their values fixed. As described above, I think these arguments (“constraints form reason/​rationality”) have some force, but are not all of what shapes agents, and as such, my best guess position remains that even highly advanced AI systems will be enactively embedded in their environment such as to continue to face value malleability to relevant extents.

  • What makes specific cases of value change (il)legitimate?

    • Yeah so I guess that’s the big question if you take the arguments from the “value change problem” seriously. Mostly, I think this should be considered an open research question/​program. That said, I think there exist a range of valuable trailheads from existing scholarship.

    • A partial answer is my proto-notion of legitimacy described here.

      • The core generator notions of autonomy/​self-determination/​freedom form ~political philosophy, and applies this to the siutation with advanced AI systems.

        • My high level take is that some traditions in political philosophy (and adjacent areas of scholarship) are actually pretty good at noticing very important phenomena (including ones that moral philosophy fails to capture for what I refer to an “individualist” bias, similar to how I think single-single intent alignment misses out on a bunch of important things due to relying on leaky abstractions) - but they suck at formalizing those phenomena such that you can’t really use them as they are to let powerful optimizers run on these concepts without misgeneralisation. As such, my overall aspiration for this type of work is to both do careful scholarship that is able to appreciate and capture the “thick” notions of these phenomena, but then be more ambitious in pushing for formalization.

      • A different generator that jams well with this is “boundary”-based approaches to formalizing safe AI interactions.

      • That said, as I mention briefly in the sequence too, I think this is definitely not the end of the story.

        • There is a range of edge cases/​real world examples I want to hold my current notion of legitimacy against to see how it breaks. Parenting & education seems like particularly interesting examples. For strong versions of my legitimacy criteria, ~most parenting methods would count as inducing illegitimate value change. I think there is an important argument from common sense that this appears to strong of a result. In other words, a sensible desideratum to have for criteria of legitimacy is that there must be some realistic approaches to parenting that would count as legitimate.

        • Furthermore, autonomy/​self-determination/​freedom and their likes rely on some notions of individuality and agent-boundaries that are somewhat fuzzy. We don’t want to go so far as to start pretending agents are or could ever be completely separate and un-influenced by their environment—agent boundaries are necessarily leaky. At the same time, there are also strong reasons to consider not all forms of leakiness/​influence taking to be morally the same. A bunch of work to be done to clarify where “the line” (which might not be a line) is.

          • This is also in part why I think discussion about the value change problem must consider not only the problem of causing illegitimate value change, but also the problem of undermining legitimate value change (while I think naturally more attention is paid to the former only). In other words, I want a notion of legitimacy that is useful in talking about both of these risks.

    • I think a bunch of people have the intuitions that at least part of what makes a value change (il)legitimate has to be evaluated with reference to the object level values adopted. I remain so far skeptical of that (although I recognize it’s a somewhat radical position). The main reason for this is that I think there is just no “neutral ground” at the limits on which to evaluate . So while pragmatically we might be forced to adopt notions of legitimacy that also refer to object level beliefs (and that this might be better than the practically available alternatives), I simultaneously think this is conceptually very dissatisfying and am skeptical that such an approach with be principled enough to solve the ambitious version of the alignment problem (i.e. generalize well to sufficiently powerful AI systems).

      • FWIW this is, according to me, related to why political philosophy and philosophy of science can provide relevant insights into this sort of question. Both of these domains are fundamentally concerned with (according to me) processes that reliably error correct more or less no-matter where you start from/​while aspiring to be agnostic about your (initial) object level. Also, they don’t fall pray to the “leaky abstraction of the individual” (as much) as I alluded to before. (In contrast, the weakness/​pitfalls of their “counterparts” moral philosophy and epistemology is often that they are concerned with particulars without being able to not fall pray to status quo biases (where status quo biases are subject to exploitation by power dynamics which also causes decision theoretic problems).)

  • Relationship to concepts like rationality, agency, etc.?

    • This is a very open-ended question so I will just give a short and spicy take: I think that the “future textbook on the foundations of agency” will also address normative/​value-related questions (albeit in a naturalized way). In other words, if we had a complete/​appropriate understanding of “what is an agent”, this would imply we also had an appropriate understanding of e.g. the functional logic of value change. And my guess is that “rationality” is part of what is involved in bridging between the descriptive to the prescriptive, and back again.

Mateusz Bagiński

Thanks! I think it’s a good summary/​closing statement/​list of future directions for investigation, so I would suggest wrapping it right there, as we’ve been talking for quite a while.

Nora_Ammann

Sounds good, yes! Thanks for engaging :)