A few quick thoughts on measuring disempowerment

People want to measure and track gradual disempowerment. One issue with a lot of the proposals I’ve seen is that they don’t distinguish between empowering and disempowering uses of AI. If everyone is using AI to write all of their code, that doesn’t necessarily mean they are disempowered (in an important sense). And many people will look at this as a good thing—the AI is doing so much valuable work for us!

It generally seems hard to find metrics for AI adoption that clearly track disempowerment; I think we may need to work a bit harder to interpret them. One idea is to augment such metrics with other sources of evidence, e.g. social science studies, such as interviews, of people using the AI in that way/​sector/​application/​etc.

We can definitely try using formal notions of empowerment/​POWER (cf https://​​arxiv.org/​​abs/​​1912.01683). Note that these notions are not necessarily appropriate as an optimization target for an AI agent. If an AI hands you a remote control to the universe but doesn’t tell you what the buttons do, you aren’t particularly empowered.

People could also be considered more disempowered if:

  • They are less able to predict important attributes of a system’s behavior (cf “simulatability” in interpretability research)

    • Example: An artist is empowered if they can realize a particular vision for a piece, whereas generating art by prompting current AIs is much more like a process of curation.

    • This is similar to empowerment but more focused on prediction rather than control.

  • They self-report it.

  • They are “rubber-stamping” their approval of AI decisions

    • This can be revealed by red-teaming the human overseer, i.e. sending them examples they should flag and seeeing if they fail to flag them.