[Question] Optimizing for Agency?

It’s commonly accepted that pretty much every optimization target results in death. If you optimize for paperclips, humans die. If you optimize for curing cancer, humans die.

What about optimizing for agency?

The way I visualize this is after a superintelligence takeover, and the superintelligence is optimizing for intelligent agency, all intelligent beings (including animals) have a sort of ‘domain’ of agency around them. This domain extends as far as possible but ends when it comes into contact with another agent’s domain.

For example, let’s say you’re hungry. You have maximum agency over this area, so you request a burrito and the AI summons a drone, which speeds over to you and drops you a Chipotle burrito.

The superintelligence is constantly balancing agency between people.

One person might want to build a house. They’re walking across a field and decide “There should be a house here.” The AI then begins rapidly moving construction robots and building a house. Agency for the person is maximized.

What if someone else wants that field to remain a field?

Then, the agency domains clash. The AI attempts to preserve the agency of both individuals (without manipulating them, since manipulation would reduce agency). This would result in some kind of negotiation between the two individuals, where both parties end negotiations with their agency still maximized. Maybe another area is used to build a house. Maybe the house is built, but some land is given to the other person.

The core idea is that this is an AI that will help you do whatever you want but will prevent you from reducing the agency of others.

You could train an AI like this to get familiar maximizing the agency of humanlike LLMs in a simulation. At each point, the simulated human (which an LLM) has a set of options open to them. The AI tries to maximize those options without reducing the options of other agents. An AI that became superintelligent at this should hopefully allow us to cure cancer or create beautiful new worlds without also wiping everyone out.

The outcome of this type of AI might be weird, but it hopefully won’t be as bad as extinction.

-Essentially, no one would ever be able to kill or harm others without full consent of both parties, since being killed/​harmed reduces agency.

-Conscious beings would be essentially immortal and invincible, even creatures like fish.

-Everyone might end up in a simulation, as this would give the AI perfect fine-grained control so that everyone could live in their own agency-maxed world. Ideally, having agency would allow the people in the simulation to have the agency to exit the simulation and do things in reality if they desired. They could also always go back into the simulation if they wanted. This is hopefully what would happen when you maximize agency.

-People should also be free from addictive stimuli, since addiction reduces agency. People should always have the agency to ‘brainwash’ themselves and remove addiction. Additionally, the AI should be able to intervene to ensure maximum agency if it deems that the person has lost agency to addiction.


-People might not be able to die, or sleep, as these technically reduce their agency. Hopefully, people should be able to voluntarily reduce their agency if they wanted to sleep for a while, or permanently reduce their agency by passing on. Not being allowed to sleep is a reduction of agency, just as not being allowed to die is a reduction of agency.

-This is roughly similar to the ‘optimize for all human utility functions’ as discussed in this post: https://​​www.lesswrong.com/​​posts/​​wnkGXcAq4DCgY8HqA/​​a-case-for-ai-alignment-being-difficult#It_is_hard_to_specify_optimization_of_a_different_agent_s_utility_function

-I’m not sure how agency should scale with intelligence, especially if people want to become superintelligent.

-People in this AI-optimized world might become extremely self-centered and lose some of the meaning that we find in humanity.

-Defining agency is tough in its own right.

I think the form for a solution to AI Alignment is found somewhere around this problem. Something about balancing many optimizations to create an equilibrium, while retaining a world where consciousness can flourish, is a key. Our world is currently balanced because everyone has some amount of agency, so we use systems of cooperation to reach for better ends.

Part of a thought experiment I’ve been doing where I consider a superintelligence optimizing for different targets: