Two Aspects of Situational Awareness: World Modelling & Indexical Information

I’m writing this post to share some of my thinking about situational awareness, since I’m not sure others are thinking about it this way.

For context, I think situational awareness is a critical part of the case for rogue AI and scheming-type risks. But incredibly, it seems to have been absent from any of the written arguments prior to Ajeya’s post Without specific countermeasures, the easiest path to transformative AI likely leads to AI takeover.

Basically, in this post I just want to:

  1. Point out that situational awareness seems to involve 2 different things (world modelling and indexical information), and explain the difference.

  2. Discuss the significance for AI safety.

  3. Note that AFAIK philosophers have already made this distinction, and noted that indexical information raises a challenge to physicalism.[1]

So here we go.

Distinguishing world modeling from indexical information

Maybe the easiest way to do this is just to describe the philosophical puzzle… Suppose you know everything about the past, present, and future of the universe in complete physical detail, but you don’t know where YOU are in space-time (i.e. when is “now”? who does “I” refer to?). Then you learn those things. It seems like you learned something. And it seems like that thing isn’t something physical because you already knew all the physical facts about the universe. That “pointer” to “here” and “now” you received is providing “indexical information”. The stuff you already knew is a (perfect) world model.[2]

Significance for situational awareness and AI safety

The reason I’m writing this post is that I think it’s often worth distinguishing these things, but when I hear people discuss situational awareness, they often seem to blend or conflate them in a somewhat confusing way. I think of situational awareness as primarily or even exclusively about indexical information.[3]

This is because ignorance of indexical information in particular seems like a key piece in a lot of schemes for keeping AI from going rogue, or catching it when it does (e.g. ideas in AI Control, or context swapping).

The basic intuition here is that an AI system needs to have both a good world model and indexical information in order to scheme competently. In particular, it needs to know how it’s actions will affect the physical world, e.g. perhaps most significantly: is it being tested in simulation, or will seizing power actually give it power in the real world?

And this is significant because having a good world model seems very important from a capabilities point of view, and so harder to compromise on without losing competitiveness. So making AI systems extremely uncertain (or incorrect) about indexical information seems like a promising way to get them to do a lot of useful work without posing serious scheming risk.

I also think nobody has really analyzed things carefully from this point of view, and this seems like a promising direction for theoretical work. Decision-theory stuff seems highly relevant. I haven’t thought carefully about this part, but it seems like schemes that rely on indexical ignorance may succeed or fail depending on an AI’s decision theory and it’s stance on anthropics.

  1. ^

    I’m not attacking physicalism here, TBC. I just think this is interesting and worth knowing about.

  2. ^

    Tangentially, maybe it’s not perfect because it doesn’t include counterfactuals, causal mechanisms, or things like “the laws of physics”. I think these notions present metaphysical conundrums similar to those of indexical information, and are likewise suspect. But all this would be a topic for another post.

  3. ^

    This is a bit unsatisfying because I lack a good philosophical account of indexical information, but seems fine for most practical purposes.