I wanted to write a long, detailed, analytic post about this, somewhat like my Radical Probabilism post (to me, this is a similarly large update). However, I haven’t gotten around to it for a long while. And perhaps it is better as a short, informal post in any case.

I think my biggest update over the past year has been a conversion to teleosemantics. Teleosemantics is a theory of semantics—that is, “meaning” or “aboutness” or “reference”.[1]

To briefly state the punchline: Teleosemantics identifies the semantics of a symbolic construct as what the symbolic construct has been optimized to accurately reflect.

Previously, something seemed mysterious about the map/​territory relationship. What could possibly imbue ‘symbols’ with ‘meaning’? The map/​territory analogy seems inadequate to answer this question. Indeed, to analogize “belief” with “map” and “the subject of belief” with “territory” commits a homunculus fallacy! The meaning-makers are the map-readers and map-writers; but they can only make meaning by virtue of the beliefs within their own heads. So the map/​territory analogy seems to suggest that an infinite regress of meaning-makers would be required.

You probably won’t believe me at first. Perhaps you’ll say that the lesson of the map/​territory analogy is the correspondence between the map and the territory, which exists independently of the map-reader who uses the correspondence to evaluate the map.

I have several objections.

  1. If it’s a probabilistic correspondence, where the map contains information about the territory, these are subjective notions, which require some viewpoint.

  2. If it’s a correspondence based on some sort of ontology, where pieces of the map line up with “pieces of reality”, I would also say the ontology is in itself a subjective perspective.

  3. You might think you can define the map/​territory correspondence without invoking a map-maker or map-reader by objectively defining the “fit” of a correspondence (so that the meaning of a symbol is based on the best-fitting correspondence, or perhaps, the cloud of well-fitting correspondences). But well-fitting correspondence will include many examples of accidental correspondence, which seem to have little to do with aboutness. Moreover, I think theories like this will fail to adequately account for false belief, which screws up the fit.[2]

But my point here isn’t to denounce the map/​territory picture! I still think it is a good framework. Rather, I wanted to gesture at how I still felt confused, despite having the map/​territory picture.

I needed a different analogy, something more like a self-drawing map, to get rid of the homunculus. A picture which included the meaning-maker, rather than just meaning come from nowhere.

Teleosemantics reduces meaning-making to optimization. Aboutness becomes a type of purpose a thing can have.

One advantage of this over map-territory correspondence is that it explains the asymmetry between map and territory. Mutual information is symmetric. So why is the map about the territory, but not the other way around? Because the map has been optimized to fit the territory, not the other way around. (“Fit” in the sense of carrying high mutual information, which can be decoded via some specific intended correspondence—a symbolic language.)

What does it mean to optimize for the map to fit the territory, but not the other way around? (After all: we can improve fit between map and territory by changing either map or territory.) Maybe it’s complicated, but primarily what it means is that the map is the part that’s being selected in the optimization. When communicating, I’m not using my full agency to make my claims true; rather, I’m specifically selecting the claims to be true.

I take Teleosemantics to be the same idea as ‘reference maintenance’, and in general, highly compatible with the ideas laid out in On the Origin of Objects by Brian Cantwell Smith.

I think a further good feature of a language is that claims are individually optimized to be true. To get an accurate answer to a specific question, I want that answer to be optimized to be accurate; I don’t want the whole set of possible answers to have been jointly optimized. Unfortunately, in realistic communication, we do somewhat optimize to present a simple, coherent view, rather than only optimizing each individual statement to be accurate—doing so helps our perspective to be understood easily by the listener/​reader. But my intuition is that this does violate some ideal form of honesty. (This is one of the ways I think the concept of optimizing fit may be complicated, as I mentioned earlier.)

Connotation vs Denotation

I’ve previously argued that the standard Bayesian world-view lacks a sufficient distinction between connotation (the probabilistic implications of a communication) and denotation (the literal meaning). Teleosemantics provides something close, since we can distinguish between what a communication probabilistically implies vs what the communication was optimized to correspond to.

For example, I might notice that someone’s hair is a mess and spontaneously tell them so, out of a simple drive to comment on notable-seeming facts. I chose my words in an attempt to accurately reflect the state of affairs, based on my own observations. So the teleosemantic-meaning of my words is simply the literal: “your hair is a mess”.

However, the listener will try to work out the probabilistic implications of the utterance. Why might I tell them that their hair is a mess? Perhaps I dislike them and took the opportunity to insult them. Or perhaps I consider them a close enough friend that I think a gentle insult will be taken in a sporting way. These are possible conversational implications.

In this example, teleosemantics vs probabilistic implication matches fairly well with literal meaning vs connotation. However, there are some cases which present more difficulties:

  • A more socially savvy communicator will understand the connotations of their speech, and optimize for these as well.

  • One of my main criticisms of the probabilistic-implication account of meaning was its inability to properly define lying. However, this also appears to be a problem for the current account!

I think the best way to handle these issues is to give up on a single account of the “meaning” of an utterance, and instead invent some useful distinctions.

Obviously, often what we care about most is the raw informational content. For example, it seems plausible that in the case of ELK, that’s what we care about.

Another thing we often care about is what something has been optimized for. Understanding the “purpose” of something is generally very useful for understanding and manipulating our environment, even though it’s not a “physical” fact. Intended meaning is a subspecies of purpose—a symbol is supposed to represent something. This notion of meaning can include both denotation and connotation, depending on the author’s intent.

But, we can further split up authorial intent:

  • If the author was optimizing for a specific belief to be engendered into the audience, then we can talk about “what the author wants us to believe”. But this form of communication can be manipulative and dishonest.

  • If the author was trying to accurately represent information, we can talk about what information the author was trying to represent. This is a purer form of communication.

This accounts for lying, more or less. Lying means you’re optimizing for a different belief in the audience than the belief you have. But we still haven’t completely pinned down what “denotation” could mean.

Another important type of intent is the intended meaning of a word in a broader, societal context. A language is meaningful in the context of a linguistic community. To a large extent, a linguistic community is setting about the business of creating a shared map of reality, and the language is the medium for the map.

This makes a linguistic community into a sort of super-agent. The common subgoal of accurate beliefs is being pooled into a group resource, which can be collectively optimized.

Obviously, the “intended meaning” of a word in this collective sense will always be somewhat vague. However, I think humans very often concern ourselves with this sort of “meaning”. A linguistic community has to police its intended map-territory correspondence. This includes rooting out lies, but it also includes pedantry—policing word-meanings to keep the broader language coherent, even when there’s no local intelligibility problem (so pedantry seems pointless in the moment).

One way of looking at the goal of ELK, and AI transparency more generally, is that we need an answer to the question how can we integrate AIs into our linguistic community?

  • To police AI honesty, we need to be able to tell whether AIs are being honest or deceptive.

  • In order to make research progress on this, we need a suitable understanding of what it means for AI systems to be honest or deceptive. (IE, when do we say that the AI system possesses latent knowledge which we want to solicit?)

The book Communicative Action and Rational Choice discusses how the behavior of a linguistic community is hard to analyze in a traditional rational-agent framework (particularly the selfish rationality of economics). Within a consequentialist framework, it seems as if communicative acts would always be optimized for their consequences, so, never be optimized for accuracy (hence, would lack meaning in the teleosemantic sense).[3] This mirrors many of the concerns for AI—why wouldn’t a highly capable AI be deceptive when it suited the AI’s goals?

Even humans are often deceptive when we can get away with it. (So, eg, “raise the AI like a human” solutions don’t seem very reassuring.) But humans are also honest much more often than naive consequentialism would suggest. Indeed, I think humans often communicate in the teleosemantic sense, IE optimizing accuracy.

A linguistic community also tends to become a super-agent in a stronger sense (discussed in Communicative Action and Rational Choice): coordinating actions. A member of a linguistic community is able to give and receive reasons for taking specific actions (eg, following and enforcing specific norms), rather than only swapping reasons for beliefs.

Allowing AIs to participate fully in a linguistic community in this broader sense could also be an interesting framework for thinking about alignment.

  1. ^

    Thanks to Steve Petersen for telling me about it.

  2. ^

    The proponent of a goodness-of-fit theory would, I think, have to argue that false beliefs harm the correspondence only a little. If you imagine holding a semi-transparent map over the territory, and rotating/​sliding it into the best-fit location, the “false beliefs” would be the features which still don’t fit even after we’ve found the best fit.

    This theory implies that beliefs lose meaning at the point where accumulated errors stop us from locating a satisfying best-fit.

    I think this is not quite true. For example, a blindfolded person who believes they are in London when they are in fact in Paris could have a very detailed mental map of their surroundings which is entirely wrong. You might reasonably insist that the best-fit interpretation of those beliefs is nailed down by years of more accurate beliefs about their surroundings. I’m skeptical that the balance needs to work out that way.

    Moreover, it seems unfortunate for the analysis to be so dependent on global facts. Because a goodness-of-fit theory ascribes semantics based on an overall best fit, interpreting the semantics of one corner of the map depends on all corners. Perhaps some of this is unavoidable; but I think teleosemantics provides a somewhat more local theory, in which the meaning of an individual symbol depends only on what that symbol was optimized to reflect.

    (For example, imagine someone writing a love letter on a map, for lack of blank paper. I think this creates somewhat more difficulty for a goodness-of-fit theory than for teleosemantics.)

  3. ^

    To be clear, I’m not currently on board with this conclusion—I think consequentialist agents can engage in cooperative behavior, and coordinate to collectively optimize common subgoals. Just because you are, in some sense, “optimizing your utility” at all times, doesn’t mean you aren’t optimizing statements for accuracy (as a robust subgoal in specific circumstances).

    However, it would be nice to have a more detailed picture of how this works.