A few quick thoughts on measuring disempowerment
People want to measure and track gradual disempowerment. One issue with a lot of the proposals I’ve seen is that they don’t distinguish between empowering and disempowering uses of AI. If everyone is using AI to write all of their code, that doesn’t necessarily mean they are disempowered (in an important sense). And many people will look at this as a good thing—the AI is doing so much valuable work for us!
It generally seems hard to find metrics for AI adoption that clearly track disempowerment; I think we may need to work a bit harder to interpret them. One idea is to augment such metrics with other sources of evidence, e.g. social science studies, such as interviews, of people using the AI in that way/sector/application/etc.
We can definitely try using formal notions of empowerment/POWER (cf https://arxiv.org/abs/1912.01683). Note that these notions are not necessarily appropriate as an optimization target for an AI agent. If an AI hands you a remote control to the universe but doesn’t tell you what the buttons do, you aren’t particularly empowered.
People could also be considered more disempowered if:
They are less able to predict important attributes of a system’s behavior (cf “simulatability” in interpretability research)
Example: An artist is empowered if they can realize a particular vision for a piece, whereas generating art by prompting current AIs is much more like a process of curation.
This is similar to empowerment but more focused on prediction rather than control.
They self-report it.
They are “rubber-stamping” their approval of AI decisions
This can be revealed by red-teaming the human overseer, i.e. sending them examples they should flag and seeeing if they fail to flag them.
Really good question!
I have to reach for fictional examples here, because it provides a wider range of complex scenarios than most people think of spontaneously. I’m going to stick with arguably positive scenarios, instead of outright dystopias, and I’m going to pick scenarios that seriously attempt to paint a world.
Are people in Iain Banks’ Culture empowered? Or are they pets? Or maybe both? The have no actual power over the Minds.
Are people in CelestAI’s universe empowered when they decide to upload? (Probably not, because Celeste is capable of steamrolling consent and is happy to lie.) After they upload?
What about those people in Karl Schroeder’s Ventus who experience thalience with the machines? Thalience is hard to explain, but it’s the relationship you might expect between humans and woodland fairies. They’re powerful, potentially dangerous, and very much other. They are not mirrors or servants of humans, but something alien and perhaps wonderous. The humans are not exactly empowered, but they can stare into nature and see strange minds staring back, and maybe this is something humans always kind of wanted?
What about humans in the Matrix, who think they’re living in the late 90s?
Is the typical person in the modern world empowered or not by their society?
The fictional examples above are all arguably optimistic visions, perhaps impossibly so if you buy Yudkowsky’s arguments. But making a taxonomy of these examples might help figure out what “empowered” means.
Along the lines of your artist example, I find the instrument case to be a nice intuition pump.
An instrument is a technology that is deeply empowering! The human vocal range can simulate a vast range of sounds, but it’s very hard to do so and composing with just your voice (in the way one can having played/access to a piano) is I imagine impossible.
Another important facet of this example is that directly working with waveforms via a programming language or even with an interface, e.g. of a DAW, is universal but not empowering in the same way!
I think of this example as one of a broader range in which the interface is optimized for rich human interaction. One can imagine that in certain worlds interfaces become increasingly optimized for AI interaction. For example, future AIs likely will likely disprefer GUIs etc.
Formalizing what is meant by good vs bad interfaces may be another way to get useful notions of empowerment.
Good question, good overview!
Minor note on the last point, which seems like a good idea, but human oversight failures take a number of forms. The proposed type of red-teaming probably catches a lot of them, but will focus on easy to operationalize / expected failure modes, and ignores the institutional incentives that will make oversight fail even when it could succeed, including unwillingness to respond due to liability concerns, slow response to correctly identified failures. (See our paper and poster at AIGOV 2026 at AAAI.)