Free to Optimize

Stare decisis is the legal principle which binds courts to follow precedent, retrace the footsteps of other judges’ decisions. As someone previously condemned to an Orthodox Jewish education, where I gritted my teeth at the idea that medieval rabbis would always be wiser than modern rabbis, I completely missed the rationale for stare decisis. I thought it was about respect for the past.

But shouldn’t we presume that, in the presence of science, judges closer to the future will know more—have new facts at their fingertips—which enable them to make better decisions? Imagine if engineers respected the decisions of past engineers, not as a source of good suggestions, but as a binding precedent!—That was my original reaction. The standard rationale behind stare decisis came as a shock of revelation to me; it considerably increased my respect for the whole legal system.

This rationale is jurisprudence constante: The legal system must above all be predictable, so that people can execute contracts or choose behaviors knowing the legal implications.

Judges are not necessarily there to optimize, like an engineer. The purpose of law is not to make the world perfect. The law is there to provide a predictable environment in which people can optimize their ownfutures.

I was amazed at how a principle that at first glance seemed so completely Luddite, could have such an Enlightenment rationale. It was a “shock of creativity”—a solution that ranked high in my preference ordering and low in my search ordering, a solution that violated my previous surface generalizations. “Respect the past just because it’s the past” would not have easily occurred to me as a good solution for anything.

There’s a peer commentary in Evolutionary Origins of Morality which notes in passing that “other things being equal, organisms will choose to reward themselves over being rewarded by caretaking organisms”. It’s cited as the Premack principle, but the actual Premack principle looks to be something quite different, so I don’t know if this is a bogus result, a misremembered citation, or a nonobvious derivation. If true, it’s definitely interesting from a fun-theoretic perspective.

Optimization is the ability to squeeze the future into regions high in your preference ordering. Living by my own strength, means squeezing my own future—not perfectly, but still being able to grasp some of the relation between my actions and their consequences. This is the strength of a human.

If I’m being helped, then some other agent is also squeezing my future—optimizing me—in the same rough direction that I try to squeeze myself. This is “help”.

A human helper is unlikely to steer every part of my future that I could have steered myself. They’re not likely to have already exploited every connection between action and outcome that I can myself understand. They won’t be able to squeeze the future that tightly; there will be slack left over, that I can squeeze for myself.

We have little experience with being “caretaken” across any substantial gap in intelligence; the closest thing that human experience provides us with is the idiom of parents and children. Human parents are still human; they may be smarter than their children, but they can’t predict the future or manipulate the kids in any fine-grained way.

Even so, it’s an empirical observation that some human parents dohelp their children so much that their children don’t become strong. It’s not that there’s nothing left for their children to do, but with a hundred million dollars in a trust fund, they don’t need to do much—their remaining motivations aren’t strong enough. Something like that depends on genes, not just environment —not every overhelped child shrivels—but conversely it depends on environment too, not just genes.

So, in considering the kind of “help” that can flow from relatively stronger agents to relatively weaker agents, we have two potential problems to track:

  1. Help so strong that it optimizes away the links between the desirable outcome and your own choices.

  2. Help that is believedto be so reliable, that it takes off the psychological pressure to use your own strength.

Since (2) revolves around belief, could you just lie about how reliable the help was? Pretend that you’re not going to help when things get bad—but then if things do get bad, you help anyway? That trick didn’t work too well for Alan Greenspan and Ben Bernanke.

A superintelligence might be able to pull off a better deception. But in terms of moral theory and eudaimonia—we areallowed to have preferences over external states of affairs, not just psychological states. This applies to “I want to really steer my own life, not just believe that I do”, just as it applies to “I want to have a love affair with a fellow sentient, not just a puppet that I am deceived into thinking sentient”. So if we can state firmly from a value standpoint that we don’t want to be fooled this way, then buildingan agent which respects that preference is a mere matter of Friendly AI.

Modify people so that they don’t relax when they believe they’ll be helped? I usually try to think of how to modify environments before I imagine modifying any people. It’s not that I want to stay the same person forever; but the issues are rather more fraught, and one might wish to take it slowly, at some eudaimonic rate of personal improvement.

(1), though, is the most interesting issue from a philosophicalish standpoint. It impinges on the confusion named “free will”. Of which I have already untangled; see the posts referenced at top, if you’re recently joining OB.

Let’s say that I’m an ultrapowerful AI, and I use my knowledge of your mind and your environment to forecast that, if left to your own devices, you will make $999,750. But this does not satisfice me; it so happens that I want you to make at least $1,000,000. So I hand you $250, and then you go on to make $999,750 as you ordinarily would have.

How much of your own strength have you just lived by?

The first view would say, “I made 99.975% of the money; the AI only helped 0.025% worth.”

The second view would say, “Suppose I had entirely slacked off and done nothing. Then the AI would have handed me $1,000,000. So my attempt to steer my own future was an illusion; my future was already determined to contain $1,000,000.”

Someone might reply, “Physics is deterministic, so your future is already determined no matter what you or the AI does—”

But the second view interrupts and says, “No, you’re not confusing me that easily. I am within physics, so in order for my future to be determined by me, it must be determined by physics. The Past does not reach around the Present and determine the Future before the Present gets a chance—that is mixing up a timeful view with a timeless one. But if there’s an AI that really does look over the alternatives before I do, and really does choose the outcome before I get a chance, then I’m really not steering my own future. The future is no longer counterfactually dependent on my decisions.”

At which point the first view butts in and says, “But of course the future is counterfactually dependent on your actions. The AI gives you $250 and then leaves. As a physical fact, if you didn’t work hard, you would end up with only $250 instead of $1,000,000.”

To which the second view replies, “I one-box on Newcomb’s Problem, so my counterfactual reads ‘if my decision were to not work hard, the AI would have given me $1,000,000 instead of $250’.”

“So you’re saying,” says the first view, heavy with sarcasm, “that if the AI had wanted me to make at least $1,000,000 and it had ensured this through the general policy of handing me $1,000,000 flat on a silver platter, leaving me to earn $999,750 through my own actions, for a total of $1,999,750—that this AI would have interfered lesswith my life than the one who just gave me $250.”

The second view thinks for a second and says “Yeah, actually. Because then there’s a stronger counterfactual dependency of the final outcome on your own decisions. Every dollar you earned was a real added dollar. The second AI helped you more, but it constrained your destiny less.”

“But if the AI had done exactly the same thing, because it wantedme to make exactly $1,999,750—”

The second view nods.

“That sounds a bit scary,” the first view says, “for reasons which have nothing to do with the usual furious debates over Newcomb’s Problem. You’re making your utility function path-dependent on the detailed cognition of the Friendly AI trying to help you! You’d be okay with it if the AI only could give you $250. You’d be okay if the AI had decided to give you $250 through a decision process that had predicted the final outcome in less detail, even though you acknowledge that in principle your decisions may already be highly deterministic. How is a poor Friendly AI supposed to help you, when your utility function is dependent, not just on the outcome, not just on the Friendly AI’s actions, but dependent on differences of the exact algorithm the Friendly AI uses to arrive at the same decision? Isn’t your whole rationale of one-boxing on Newcomb’s Problem that you only care about what works?”

“Well, that’s a good point,” says the second view. “But sometimes we only care about what works, and yet sometimes we do care about the journey as well as the destination. If I was trying to cure cancer, I wouldn’t care how I cured cancer, or whether I or the AI cured cancer, just so long as it ended up cured. This isn’t that kind of problem. This is the problem of the eudaimonic journey—it’s the reason I care in the first place whether I get a million dollars through my own efforts or by having an outside AI hand it to me on a silver platter. My utility function is not up for grabs. If I desire not to be optimized too hard by an outside agent, the agent needs to respect that preference even if it depends on the details of how the outside agent arrives at its decisions. Though it’s also worth noting that decisions areproduced by algorithms— if the AI hadn’t been using the algorithm of doing just what it took to bring me up to $1,000,000, it probably wouldn’t have handed me exactly $250.”

The desire not to be optimized too hard by an outside agent is one of the structurally nontrivial aspects of human morality.

But I can think of a solution, which unless it contains some terrible flaw not obvious to me, sets a lower bound on the goodness of a solution: any alternative solution adopted, ought to be at least this good or better.

If there is anything in the world that resembles a god, people will try to pray to it. It’s human nature to such an extent that people will pray even if there aren’t any gods—so you can imagine what would happen if there were! But people don’t pray to gravity to ignore their airplanes, because it is understood how gravity works, and it is understood that gravity doesn’t adapt itself to the needs of individuals. Instead they understand gravity and try to turn it to their own purposes.

So one possible way of helping—which may or may not be the best way of helping—would be the gift of a world that works on improved rules, where the rules are stable and understandable enough that people can manipulate them and optimize their own futures together. A nicer place to live, but free of meddling gods beyond that. I have yet to think of a form of help that is less poisonous to human beings—but I am only human.

Added: Note that modern legal systems score a low Fail on this dimension—no single human mind can even know all the regulations any more, let alone optimize for them. Maybe a professional lawyer who did nothing else could memorize all the regulations applicable to them personally, but I doubt it. As Albert Einstein observed, any fool can make things more complicated; what takes intelligence is moving in the opposite direction.

Part of The Fun Theory Sequence

Next post: “Harmful Options

Previous post: “Living By Your Own Strength