Free to Optimize
Stare decisis is the legal principle which binds courts to follow precedent, retrace the footsteps of other judges’ decisions. As someone previously condemned to an Orthodox Jewish education, where I gritted my teeth at the idea that medieval rabbis would always be wiser than modern rabbis, I completely missed the rationale for stare decisis. I thought it was about respect for the past.
But shouldn’t we presume that, in the presence of science, judges closer to the future will know more—have new facts at their fingertips—which enable them to make better decisions? Imagine if engineers respected the decisions of past engineers, not as a source of good suggestions, but as a binding precedent!—That was my original reaction. The standard rationale behind stare decisis came as a shock of revelation to me; it considerably increased my respect for the whole legal system.
This rationale is jurisprudence constante: The legal system must above all be predictable, so that people can execute contracts or choose behaviors knowing the legal implications.
Judges are not necessarily there to optimize, like an engineer. The purpose of law is not to make the world perfect. The law is there to provide a predictable environment in which people can optimize their ownfutures.
I was amazed at how a principle that at first glance seemed so completely Luddite, could have such an Enlightenment rationale. It was a “shock of creativity”—a solution that ranked high in my preference ordering and low in my search ordering, a solution that violated my previous surface generalizations. “Respect the past just because it’s the past” would not have easily occurred to me as a good solution for anything.
There’s a peer commentary in Evolutionary Origins of Morality which notes in passing that “other things being equal, organisms will choose to reward themselves over being rewarded by caretaking organisms”. It’s cited as the Premack principle, but the actual Premack principle looks to be something quite different, so I don’t know if this is a bogus result, a misremembered citation, or a nonobvious derivation. If true, it’s definitely interesting from a fun-theoretic perspective.
Optimization is the ability to squeeze the future into regions high in your preference ordering. Living by my own strength, means squeezing my own future—not perfectly, but still being able to grasp some of the relation between my actions and their consequences. This is the strength of a human.
If I’m being helped, then some other agent is also squeezing my future—optimizing me—in the same rough direction that I try to squeeze myself. This is “help”.
A human helper is unlikely to steer every part of my future that I could have steered myself. They’re not likely to have already exploited every connection between action and outcome that I can myself understand. They won’t be able to squeeze the future that tightly; there will be slack left over, that I can squeeze for myself.
We have little experience with being “caretaken” across any substantial gap in intelligence; the closest thing that human experience provides us with is the idiom of parents and children. Human parents are still human; they may be smarter than their children, but they can’t predict the future or manipulate the kids in any fine-grained way.
Even so, it’s an empirical observation that some human parents dohelp their children so much that their children don’t become strong. It’s not that there’s nothing left for their children to do, but with a hundred million dollars in a trust fund, they don’t need to do much—their remaining motivations aren’t strong enough. Something like that depends on genes, not just environment —not every overhelped child shrivels—but conversely it depends on environment too, not just genes.
So, in considering the kind of “help” that can flow from relatively stronger agents to relatively weaker agents, we have two potential problems to track:
Help so strong that it optimizes away the links between the desirable outcome and your own choices.
Help that is believedto be so reliable, that it takes off the psychological pressure to use your own strength.
Since (2) revolves around belief, could you just lie about how reliable the help was? Pretend that you’re not going to help when things get bad—but then if things do get bad, you help anyway? That trick didn’t work too well for Alan Greenspan and Ben Bernanke.
A superintelligence might be able to pull off a better deception. But in terms of moral theory and eudaimonia—we areallowed to have preferences over external states of affairs, not just psychological states. This applies to “I want to really steer my own life, not just believe that I do”, just as it applies to “I want to have a love affair with a fellow sentient, not just a puppet that I am deceived into thinking sentient”. So if we can state firmly from a value standpoint that we don’t want to be fooled this way, then buildingan agent which respects that preference is a mere matter of Friendly AI.
Modify people so that they don’t relax when they believe they’ll be helped? I usually try to think of how to modify environments before I imagine modifying any people. It’s not that I want to stay the same person forever; but the issues are rather more fraught, and one might wish to take it slowly, at some eudaimonic rate of personal improvement.
(1), though, is the most interesting issue from a philosophicalish standpoint. It impinges on the confusion named “free will”. Of which I have already untangled; see the posts referenced at top, if you’re recently joining OB.
Let’s say that I’m an ultrapowerful AI, and I use my knowledge of your mind and your environment to forecast that, if left to your own devices, you will make $999,750. But this does not satisfice me; it so happens that I want you to make at least $1,000,000. So I hand you $250, and then you go on to make $999,750 as you ordinarily would have.
How much of your own strength have you just lived by?
The first view would say, “I made 99.975% of the money; the AI only helped 0.025% worth.”
The second view would say, “Suppose I had entirely slacked off and done nothing. Then the AI would have handed me $1,000,000. So my attempt to steer my own future was an illusion; my future was already determined to contain $1,000,000.”
Someone might reply, “Physics is deterministic, so your future is already determined no matter what you or the AI does—”
But the second view interrupts and says, “No, you’re not confusing me that easily. I am within physics, so in order for my future to be determined by me, it must be determined by physics. The Past does not reach around the Present and determine the Future before the Present gets a chance—that is mixing up a timeful view with a timeless one. But if there’s an AI that really does look over the alternatives before I do, and really does choose the outcome before I get a chance, then I’m really not steering my own future. The future is no longer counterfactually dependent on my decisions.”
At which point the first view butts in and says, “But of course the future is counterfactually dependent on your actions. The AI gives you $250 and then leaves. As a physical fact, if you didn’t work hard, you would end up with only $250 instead of $1,000,000.”
To which the second view replies, “I one-box on Newcomb’s Problem, so my counterfactual reads ‘if my decision were to not work hard, the AI would have given me $1,000,000 instead of $250’.”
“So you’re saying,” says the first view, heavy with sarcasm, “that if the AI had wanted me to make at least $1,000,000 and it had ensured this through the general policy of handing me $1,000,000 flat on a silver platter, leaving me to earn $999,750 through my own actions, for a total of $1,999,750—that this AI would have interfered lesswith my life than the one who just gave me $250.”
The second view thinks for a second and says “Yeah, actually. Because then there’s a stronger counterfactual dependency of the final outcome on your own decisions. Every dollar you earned was a real added dollar. The second AI helped you more, but it constrained your destiny less.”
“But if the AI had done exactly the same thing, because it wantedme to make exactly $1,999,750—”
The second view nods.
“That sounds a bit scary,” the first view says, “for reasons which have nothing to do with the usual furious debates over Newcomb’s Problem. You’re making your utility function path-dependent on the detailed cognition of the Friendly AI trying to help you! You’d be okay with it if the AI only could give you $250. You’d be okay if the AI had decided to give you $250 through a decision process that had predicted the final outcome in less detail, even though you acknowledge that in principle your decisions may already be highly deterministic. How is a poor Friendly AI supposed to help you, when your utility function is dependent, not just on the outcome, not just on the Friendly AI’s actions, but dependent on differences of the exact algorithm the Friendly AI uses to arrive at the same decision? Isn’t your whole rationale of one-boxing on Newcomb’s Problem that you only care about what works?”
“Well, that’s a good point,” says the second view. “But sometimes we only care about what works, and yet sometimes we do care about the journey as well as the destination. If I was trying to cure cancer, I wouldn’t care how I cured cancer, or whether I or the AI cured cancer, just so long as it ended up cured. This isn’t that kind of problem. This is the problem of the eudaimonic journey—it’s the reason I care in the first place whether I get a million dollars through my own efforts or by having an outside AI hand it to me on a silver platter. My utility function is not up for grabs. If I desire not to be optimized too hard by an outside agent, the agent needs to respect that preference even if it depends on the details of how the outside agent arrives at its decisions. Though it’s also worth noting that decisions areproduced by algorithms— if the AI hadn’t been using the algorithm of doing just what it took to bring me up to $1,000,000, it probably wouldn’t have handed me exactly $250.”
The desire not to be optimized too hard by an outside agent is one of the structurally nontrivial aspects of human morality.
But I can think of a solution, which unless it contains some terrible flaw not obvious to me, sets a lower bound on the goodness of a solution: any alternative solution adopted, ought to be at least this good or better.
If there is anything in the world that resembles a god, people will try to pray to it. It’s human nature to such an extent that people will pray even if there aren’t any gods—so you can imagine what would happen if there were! But people don’t pray to gravity to ignore their airplanes, because it is understood how gravity works, and it is understood that gravity doesn’t adapt itself to the needs of individuals. Instead they understand gravity and try to turn it to their own purposes.
So one possible way of helping—which may or may not be the best way of helping—would be the gift of a world that works on improved rules, where the rules are stable and understandable enough that people can manipulate them and optimize their own futures together. A nicer place to live, but free of meddling gods beyond that. I have yet to think of a form of help that is less poisonous to human beings—but I am only human.
Added: Note that modern legal systems score a low Fail on this dimension—no single human mind can even know all the regulations any more, let alone optimize for them. Maybe a professional lawyer who did nothing else could memorize all the regulations applicable to them personally, but I doubt it. As Albert Einstein observed, any fool can make things more complicated; what takes intelligence is moving in the opposite direction.
Part of The Fun Theory Sequence
Next post: “Harmful Options”
Previous post: “Living By Your Own Strength”
- The Fun Theory Sequence by 25 Jan 2009 11:18 UTC; 82 points) (
- Interpersonal Entanglement by 20 Jan 2009 6:17 UTC; 81 points) (
- 31 Laws of Fun by 26 Jan 2009 10:13 UTC; 80 points) (
- Is risk aversion really irrational ? by 31 Jan 2012 20:34 UTC; 53 points) (
- Consequentialists: One-Way Pattern Traps by 16 Jan 2023 20:48 UTC; 52 points) (
- Harmful Options by 25 Dec 2008 2:26 UTC; 51 points) (
- Living By Your Own Strength by 22 Dec 2008 0:37 UTC; 42 points) (
- A Protocol for Optimizing Affection by 30 May 2012 0:38 UTC; 39 points) (
- 3 Nov 2013 21:29 UTC; 15 points)'s comment on Open Thread, November 1 − 7, 2013 by (
- 27 Jul 2011 16:12 UTC; 9 points)'s comment on What’s wrong with simplicity of value? by (
- Abandoning Cached Selves to Re-Write My Source Code Partially, I’ve Become Unstable by 10 Oct 2012 17:47 UTC; 9 points) (
- 18 Nov 2011 20:16 UTC; 8 points)'s comment on AIs and Gatekeepers Unite! by (
- 7 Oct 2021 10:07 UTC; 6 points)'s comment on Is Legal Reasoning Fundamentally Irrational? by (
- 18 Apr 2022 23:07 UTC; 4 points)'s comment on Is “Control” of a Superintelligence Possible? by (
- [SEQ RERUN] Free to Optimize by 19 Jan 2013 5:32 UTC; 4 points) (
- 17 Feb 2010 7:18 UTC; 3 points)'s comment on Hayekian Prediction Markets? by (
- 12 Nov 2009 7:37 UTC; 3 points)'s comment on Less Wrong Q&A with Eliezer Yudkowsky: Ask Your Questions by (
- 18 Jul 2013 22:13 UTC; 3 points)'s comment on Three Approaches to “Friendliness” by (
- 18 Nov 2009 19:38 UTC; 2 points)'s comment on A Less Wrong singularity article? by (
- 12 Mar 2009 23:51 UTC; 2 points)'s comment on Raising the Sanity Waterline by (
- 8 Mar 2011 4:27 UTC; 1 point)'s comment on Lifeism, Anti-Deathism, and Some Other Terminal-Values Rambling by (
- 18 Nov 2011 10:37 UTC; 1 point)'s comment on Justified Expectation of Pleasant Surprises by (
- 2 Mar 2009 4:08 UTC; 1 point)'s comment on The Most Important Thing You Learned by (
- 10 Oct 2012 22:39 UTC; 1 point)'s comment on Abandoning Cached Selves to Re-Write My Source Code Partially, I’ve Become Unstable by (
- 6 Nov 2009 1:35 UTC; 0 points)'s comment on Open Thread: November 2009 by (
- 23 Jun 2013 4:28 UTC; -3 points)'s comment on How would not having free will feel to you? by (
One good reason for the doctrine of stare decisis is that if judges know that their decision will bind future judges, they have an incentive to develop good rules, rather than just rules that favor a party to a particular case who may be sympathetic. If a good person driving negligently runs into someone loathsome who was not negligent at the time, rule-of-law notions require that the good person pay. It’s very hard for some people to accept that; stare decisis encourages judges to do it. Unfortunately, stare decisis in the US, and especially in the Supreme Court, is pretty much dead.
I think this idea somewhat resembles what I see as the best reason for tenure for academics: it forces those who decide whether to keep someone on to look at the merits more carefully than they might if the issue were only “shall we keep this person (whom we like, and who has cute children) on the payroll for another year even though he hasn’t written anything very good.” Academics not on the tenure track seem to have even more job security than those who have to go through tenure review.
Is it true that non-tenure-track faculty have higher job security? 
“So one possible way of helping—which may or may not be the best way of helping—would be the gift of a world that works on improved rules, where the rules are stable and understandable enough that people can manipulate them and optimize their own futures together.”
For some reason, I’m reminded of Dungeons & Dragons, World of Warcraft, and other games...
Wouldn’t you have to simplify the environment enough to make us all better optimizers than the FAI? Otherwise, we won’t feel like we are struggling because the FAI is still the determiner of our actions.
Wouldn’t it be a lot clearer to say that it’s dependent on, not the FAI’s algorithm, but the FAI’s actions in the counterfactual cases where you worked more or less hard?
The second one’s argument seems consistent with one-boxing, not two-boxing.
Better still, on whether the difference between the ultimate outcomes in those counterfactual cases is commensurate with the difference in my actions.
It’s interesting—raises a question of definition of counterfactual truth to a new level. The problem is that determining counterfactual truth is its own game, you can’t do that just by taking reality, changing it, and running it forward. You need to rebuild reality back from the combination of actual reality and the concept of reality existing in a mind. Counterfactuals of present set the past as well as the future, which makes facts inconsistent. Whose mind should the concepts of reality and of counterfactual change be taken from, how should their weight be evaluated against facts in actual reality?
It seems that singleton needs to optimize all of the counterfactual timelines evaluated according to cognitive algorithms running in people’s minds (with a nontrivial variety of counterfactual outcomes). This is also a way the strength of external help could be determined by the strength that people have in themselves.
Hrm… If you’re trying to optimize the external environment relative to present day humans, rather than what we may become, I’m not sure that will work.
What I mean is this: the types of improved “basic rules” we want are in a large part complicated criteria over “surface abstractions”, and lack lower level simplicity. In other words, the rules may end up being sufficiently complex that they effectively require intelligence.
Given that, if we DON’T make the interface in some sense personlike, we might end up with the horror of living in a world that’s effectively controlled by an alien mind, albeit one that’s a bit more friendly to us, for its own reasons. Sort of living in a “buddy cthulu” world, if you take my point.
You want to improve the basic rules, but would the improvements, taken as a whole, be sufficiently simple that we, as mostly (mentally) unmodified humans be able to as easily take those rules into account and optimize in that environment the way we do with, say, gravity, EM, etc?
If we want it to be intuitive and predictable, at least at the point where we’re still cognitively more or less the same as we are now, it might be better for it to at least seem like a person, since we’ve got all sorts of wiring in us that makes it easier for us to reason about people.
I understand why we may not want it to be an actual person, or to even seem like one. But let’s not go all happy death spiral on this. I think there may be a possible downside to keeping it too unpersonlike.
As for the thing about optimizing external environment before people’s minds, and tricky issues there. I simply, when thinking about that sort of thing, start with what kinds of changes I’d want to make in myself, given the opportunity (and a framework/knowledge/etc that helps me make sure the results would be what I really wanted, rather than basically slapping myself with a monkey’s paw or whatever.)
Judges go through pretty complicated cognitive algorithms in an absolute sense to make their decisions, but since we can predict them by running similar cognitive algorithms ourselves, the rules look simple—simpler than, say, Maxwell’s Equations which have much lower Kolmogorov complexity in an absolute sense. So this is the sense of “predictability” that we’re concerned with, but it’s noteworthy that a world containing meddling gods—in the sense of their being smarter than human—is less predictable on even this dimension.
Oh, and I should have added earlier that modern legal systems score a nearly complete FAIL on this attribute of Fun Theory—no one human mind can even know all the rules any more, let alone optimize for them. There should be some Constitutional rule to the effect that the complete sum of the Law must be readable by one human in one month with 8 hours of sleep every night and regular bathroom breaks.
In fact, I think that our laws are made precisely by people who don’t want us to go around optimizing our behavior to conform to the laws. Why? Because that prevents them from inserting hidden advantages for the people they like (or more specifically the people who pay them campaign contributions).
There’s simply no way to look at, say, the US tax code, or Dodd-Frank, and think, “These are laws designed to be sensible and consistently followed.” It’s much more obvious from trudging through their verbal muck that these are laws designed to be incomprehensible and strategically broken.
I mean, think about it; what’s the best possible tax code? It takes about a page:
A list of tax-brackets, linked to the GDP deflator, and their (progressive) rates; e.g. “Under $5k: no tax; $5k-$1k: 10%; $10k-$20k: 15%; $20k-$40k: 20%; $40k-$80k: 25%; $80k-$160k: 30%; $160k-$320k: 35%; over $320k: 40%”
A list of important deductions, such as charitable donation and capital loss
And… that’s about it frankly. This would raise revenue in a simple and fair way, and effectively eliminate the tax-filing and tax-compliance industry. And we could do it tomorrow by an act of Congress. But we won’t.
I hope you mean taxing additional income that much, otherwise earning 40k$ instead of 1$ less would make you pay 40k$(25%-20%)=2000$ more in taxes, which means people would have to start checking how much they´ve earned when closing to yearend, and sometimes working less (or asking for less pay) to not go to the next bracket. Why not just use a function? Like, tax rate=lesser of: 0.25earnings,40% or something like that. My personal favourite is basic income+flat tax rate though.
Note that is a special character in Markdown syntax, and so if you want to use without italicizing words, you need to type it like this:
Note that *is a special character in Markdown syntax, and so if you want to use *without italicizing words, you need to type it like this:
(Spaces will also work at differentiating “I want an asterisk” and “I want to italicize this phrase.”)
That’s what tax brackets mean.
I expect (> 0.99) that any politically achievable set of important deductions will exceed a page, and predict (> 0.8) that any set of tax structures that you’d actually like to live within will greatly exceed several dozen pages. The Earned Income Tax Credit, for one simple and exceptionally popular example, is several pages on its own.
More fundamentally, how do you define income? Do you allow corporations to exist at all—not every country does—and if you do allow them, do you tax them—not every country does—and if you tax them, when and to what degree?
Part of this is legal formalism. In the same way that you have to write a whole lot to handle general circumstances with programming code, even for relatively simple programs, any just legal system will need to describe as much of the situation as possible. ((If you don’t do it in the statute, it just falls into the court and regulatory system: arguably, the worst complexity in the US system already does this!)). Most victims of the IRS don’t need to think about whether the gift boat counts as income, just as most viewers on this web site don’t need to think about whether they handle decimal commas or sanitizes input, but if it’s not in the statute or code, respectively, things end poorly.
There’s a lot of cruft that could be removed even under that, true. Many of these statutes are the legal equivalent of code projects that have accumulated, rather than being designed-as-a-whole. The problem is that the beneficiaries of any particular deduction care very strongly about their particular deduction, while tax-simplifiers care very strongly about a random deduction,
I’d be interested in an example of what you think the specification of a deduction looks like.
I guess it might take a few more pages to precisely describe what constitutes a “charitable organization” or something… it wouldn’t take our whole tax code, that’s for sure.
Part of the problem is that we have so many stupid deductions. You get to deduct mortgage interest, and children you have, and student loans… we’re shifting incentives all over the place without any awareness of what we’re doing.
You underestimate the intentionality of the deduction scheme. The things which are incentivized by the tax code are purposeful. They have knock-on effects that aren’t anticipated, but that’s true of every policy.
Also, you significantly underestimate how difficult it is to specify all the important aspects of the tax code rigorously enough to be enforceable.
That’s a good idea. Why did no one think of that?
As far as I can tell, it’s much easier for people to add rules than to repeal them—I call this rule rachetting.
Perhaps it’s that once a rule exists, people have put work into getting used to it, and don’t want to redo their habits.
However, considering that it’s hard to get obsolete laws repealed, it’s probably a status issue. Repealing a law means that the people in charge have to admit that the group they’re affiliated with isn’t eternally correct, and that there’s some area of activity where the people who are lower status than the government are now going to be trusted to make their own choices.
I expect a lot of people suspect that the law is necessarily more complex than EY’s rule would allow, as the law has to cover so many different situations.
Or at the very least, don’t have a principle like in Australia that ignorance of the law is no excuse.
Is there anywhere that doesn’t have that principle? It seems like a system of laws where ignorance was a valid excuse would be impossible to manage, and nontrivial even for a superintelligent AI. You would have to be able to peer inside someone else’s specific brainstate for awareness of a specific concept, or of their entire past history. And how do you judge someone who had once know the law, but has since forgotten it?
Besides that, abandoning the concept of ignorantia non excusat creates an incentive to never learn the laws. Even with a perfect AI running the judicial system, that’s undesirable.
In principle, I suppose. In practice, several real-world legal structures depend on the court deciding, as a question of law, what was going on inside a person’s mind at a particular time.
And are those legal structures actually accurate? I have very strong doubts about the potential accuracy in the absence of overwhelming circumstantial evidence such as statements they made at the time away from witnesses they’d want to conceal the truth from.
I’m not claiming they’re accurate. I’m claiming that our inability to reliably read minds does not in practice prevent us from creating legal structures that depend on the court deciding what was going on in someone’s mind, and consequently it is insufficient to explain the “Ignorance of the Law Is No Excuse” principle.
I claim that their (known-but-unquantified) inaccuracy is sufficient to explain it. When you know a tool is flawed, you avoid it wherever possible.
If the court’s known inaccuracy about a defendant’s state of mind is sufficient to explain why we don’t treat people who break a law differently based on the court’s beliefs about their knowledge of the law, it is also sufficient to explain why we don’t treat people who break a law differently based on the court’s beliefs about other states of mind, such as the difference between voluntary and involuntary manslaughter.
Unfortunately, that second thing turns out to be false… we do treat those people differently.
When an explanation turns out to explain falsehoods just as easily as truths, I don’t consider it an adequate explanation for those truths.
I chose my words carefully: I said “avoid it wherever possible”. Some distinctions will naturally fall on the side where it’s deemed appropriate/necessary, generally, as in the case of manslaughter, when the difference is of enormous perceived moral significance.
So, if the “ignorance of the law is no excuse” principle were repealed by some culture C, would that surprise you, or would you merely consider it a demonstration that punishing people who are ignorant of the law is of enormous perceived moral significance to members of C?
It would greatly surprise me if any culture viewed the possibility of punishing an accidental crime, no matter how severe, as worse than allowing people guilty of serious crimes to go unpunished using a specious claim of ignorance.
OK, that answers my question, thanks.
For my own part, I would find that no more surprising than discovering a culture that viewed the punishing of an innocent person as worse than letting a guilty person go free.
I view that as basically the same, and would consider that, also, to be highly surprising. No culture I’m aware of ever took an absolutist stance on that issue, in either direction. Largely because it’s incredibly impractical.
I’m not precisely sure what you mean by “absolutist” here, but I would certainly agree that for every culture there is some (P1,P2) for which that culture accepts a P1 chance of punishing an innocent person over a P2 chance of letting a guilty person go free.
Basically, every culture ever is such a culture to an extent, so the only sense in which it could be a discovery would be if a culture had (P1,P2)=(epsilon,1-epsilon) or (P1,P2)=(0,1). Which I would consider highly surprising.
Yes, which is why I said I would agree this is true for every culture.
Yes, I would consider that surprising as well. If that’s what you mean by an absolutist stance, I agree with you that no culture took an absolutist stance on this issue, and that doing so is incredibly impractical.
But… consider two hypothetical criminal justice systems, J1 and J2, for which generally-accepted statistical studies demonstrate that J1 acquits 30% of guilty defendants and convicts 1% of innocent ones, and J2 acquits 1% of guilty defendants and convicts 30% of innocent ones.
Given a randomly selected culture, I cannot confidently predict whether that culture prefers J1 or J2. (Can you?)
Given a culture that prefers J1 to J2, all else being (implausibly) equal, I would comfortably describe that culture as viewing the punishing of an innocent person as worse than letting a guilty person go free. (Would you?)
I would not consider discovering C1 particularly surprising. (Would you?)
It’s a matter of the best results that can be attained. The legal system can choose between being unambigious (which requires very complicated language) and being understandable to an ordinary person (which requires simple language). It is impossible to have both.
The problem is that any system which chooses the latter option is itself unjust as it will inevitably create ambigious scenarios. You then have to choose between generousness in interpretation whenever an ambigiuity exists, ignorance of the true meaning of the law as an excuse, and having to convict people ignorant of how to interpret the law as there is no correct intepretation to apply to the facts. The alternative of ‘the spirit of the law’ is illusionary, as shown by the massive amount of bias humans have interpreting such a vague concept.
Yes, ignorantia non excusat is basically abandoning the entire purpose of Rule of Law and stare decisis. I don’t know why more people don’t see this; it’s basically what Kafka was talking about in The Judgment.
If that principle was abandoned, there would be every reason to never learn the laws.
Yes, that’s kind of my point: a “meddling god” of the classic “engaged in behavior that at least looked like it arose from human motivations” is something that a human can at least reasonably easily understand.
But rules arising from an alien “mind”, rules that aren’t simple either on a fundamental level or simple in a “simple relative to us” sense is something very different, not looking to us at all like a human judge making decisions.
Or am I completely and utterly missing the point here? (Don’t misunderstand. I’m not saying that it is absolutely undesirable for things, at least initially, to work out as you suggest. But it does seem to me that there’d be a bit of an “understandability” cost, at least initially.)
I think you are missing the point; the idea is that the rules are comprehensible to humans even if the process that produced them is not. As long as you can haircut the causal process at the output and end up with something humanly comprehensible, you’re fine. And anything that understands humans is quite capable of working with “human comprehensibility” as a desideratum.
Seconding Peter—the post should say “one boxing”, right?
Yeah, I was thinking “take box two” instead of “take two boxes” for some odd reason. Fixed.
Eliezer: Ah, okay, fair enough then.
I rather like the old (Icelandic?) custom of reciting the whole law out loud before opening a legislative session.
That certainly places a cap on how long your laws can be. Though perhaps too tight a cap?
Do the humans know that the Friendly AI exists?
From my own motivation, if I knew that the rules had been made easier than independent life, I would lack all motivation to work. Would the FAI allow me to kill myself, or harm others? If not, then why not provide a Culture-like existence?
I would want to be able to drop out of the game, now and then, have a rest in an easier habitat. Humans can Despair. If the game is too painful, then they will.
A good parent will bring a child on, giving challenges which are just challenging enough to be interesting, without being too challenging and guaranteeing failure. If the FAI is always going to be more superior to any individual than any parent can be, could one opt to be challenged like that, directly by the FAI, to reach ones greatest potential?
What I want are fundamental choices, not choices within a scheme the FAI dreams up.
The future is still strongly counterfactually dependent on your actions: if you pursue wealth yourself, the AI will give you a pittance, and you go on to earn riches. If you choose to do nothing, the AI gives you a fortune, and you go on in idleness.
If your preference function trivializes the method by which you became wealthy, I have difficulty believing that it cares so acutely about the method by which the AI chose to give you some amount of money.
I find the parallel with what we want from government help kind-of interesting. Because I’m about 99% certain that I’d rather have fixed rules about how people get help (if you’re unemployed, you get $X per week for N weeks maximum; if you’re seriously poor, you qualify for $Y per week under qualifying conditions Z, etc.) than have some government employee deciding, on a per-case basis, how much I deserved, or (worse) trying to improve me by deciding whether I should be given $X per week, or whether that might just encourage me to laze around the house for too long.
The parallel isn’t perfect—bureaucracies, like markets and legal systems, end up being more like some kind of idiot-savant AI, than like some near-omniscent one. But I think there is a parallel there—we’d probably mostly prefer consistent, understandable rules to our safety nets or whatever, rather than some well-meaning powerful person trying to shape us for our own good.
There is another alternative, and one that’s been proposed: an unconditional income for all. Everyone gets enough to survive. If you make more money on top of that, good for you.
F.A. Hayek rather beat you to the whole argument for an isonomic and predictable legal environment :)
It’s really quite simple: the people who designed and maintain the legal system faced a choice. Is it better for the system to be consistent but endlessly repeat its mistakes, or inconsistent but error-correcting?
They preferred it to be predictable.
And that is why it is absurd to call it a “justice system”. It’s not concerned with justice.
This post has got me thinking about my after-froze/after-upload career path. Hmm. Great! I think I’ve now found 3. So now when I retire, I know what to pursue to improve my odds of adapting successfully later.
EY: The desire not to be optimized too hard by an outside agent is one of the structurally nontrivial aspects of human morality.
The vast majority of optimization-capable agents encountered by humans during their evolutionary history were selfish entities, squeezing their futures into their preferred regions. Given enough evolutionary time, any mutant humans who didn’t resist outside manipulation would end up ‘optimized’ to serve as slave labor in favor of the ‘optimizers’.
EY: would be the gift of a world that works on improved rules
Yes, just plug the most important holes (accidental death, unwanted suffering, illness, justice, asteroids, etc.), and leave people have fun.
Are you saying that one’s brain state can be identical in two different scenarios but that you are having a different amount of fun in each? If so, I’m not sure you are talking about what most people call fun (ie a property of your experiences). If not, then what quantity are you talking about in this post where you have less of it if certain counterfactuals are true?
Toby Ord: “Fun” in the sense of “Fun Theory” is about eudaimonia and value, so to me it seems quite fair to say that you can be in an identical brain-state but be having different amounts of Fun, depending on whether the girl you’re in love with is a real person or a nonsentient puppet. This is a moral theory about what should be fun, not an empirical theory of a certain category of human brain states. If you want to study the latter you go off and do the neurology of happiness, but if that’s your moral theory of value then it implies simple wireheading.
If you know the girl is merely virtual, then you are not in the same brain state.
Fun is intrinsic, as opposed to Putnam’s views on intentional states.
And if you don’t know? I care about possibilities where bad things happen without my knowing about them, I would not choose to have the knowledge erased from my brain and call it a success.
Yes, and so do I.
That means solely that your conception of success/failure does not overlap your conception of fun/not fun.
Which is great, though it makes FAI harder.
Personally, I value Fun, Individuality, and Complexity as primitively valuable.
Maybe a suggestion would be to dismember your conception of “success” into a part which is composed of fun, and a part composed of something else.
This would make it easier to know what else, besides Fun, is worth having in your own conception.
You’ve said “humanity coud, just, you know, live and have fun” but it seems here that you do value something else but your own fun, which is that your fun is about reality.
That’s true, but it’s also irrelevant.
It is not irrelevant for the point Toby Ord was making. He was considering it odd that Eliezer takes properties which are not intrinsic to one’s brain states as increasing or decreasing the amount of fun.
Most people (myself included) consider that the amount of fun you are having is completely determined by the sum of brain states you have in a time interval.
So my comment is relevant in that if Eliezer had misconceived the possibility of having two different beliefs while having the same brain state (in which case I would recommend reading Dennett’s “Beyond Belief” his best article) he can retract his misconception, state his new position, and give Ord a chance to agree or disagree with his coherent position.
Eliezer is using “Fun” to mean something other than what you are.
Should “Fun” then be consistently capitalized as a term of art? Currently I think we have “Friendly AI theory” (captial-F, lowercase-t) and “Friendliness,” but “Fun Theory” (capital-F capital-T) but “fun.”
OK. That makes more sense then. I’m not sure why you call it ‘Fun Theory’ though. It sounds like you intend it to be a theory of ‘the good life’, but a non-hedonistic one. Strangely it is one where people having ‘fun’ in the ordinary sense is not what matters, despite the name of the theory.
This is a moral theory about what should be fun
I don’t think that can be right. You are not saying that there is a moral imperative for certain things to be fun, or to not be fun, as that doesn’t really make sense (at least I can’t make sense of it). You are instead saying that certain conditions are bad, even when the person is having fun (in the ordinary sense). Maybe you are saying that what is good for someone mostly maps to their fun, but with several key exceptions (which the theory then lists).
In any event, I agree with Z.M. Davis that you should capitalize your ‘Fun’ when you are using it in a technical sense, and explaining the sense in more detail or using a different word altogether might also help.
But that’s exactly what I’m saying. When humanity becomes able to modify itself, what things should be fun, and will we ever run out of fun thus construed? This is the subject matter of Fun Theory, which ultimately determines the Fate of the Universe. For if all goes well, the question “What is fun?” shall determine the shape and pattern of a billion galaxies.
It seems to me that Eli is interested in the known branch of anthropology known as ludology, or game studies. The first ludologist I ever knew of was the eminent philosopher Sir Michael Dummett of Oxford, an amazing, diverse guy. The history of playing cards is one of his specialties, and he has written 2 books on them.
Games can be silly (apparently the only truly universal game is peekaboo—why is that?) or profund (go). They of course are intriguing for what they say about culture, history, innate human ethics, their use of language, their unique sense of time, how they bring diverse people together or start riots, what they “mean,” what happens to people who play them, what the heck is play anyway, why do we enjoy them? Why are primates fascinated by them?
This is such a British study—“fair play” is such a crucial British cultural idea! But now you can meet ludologists who work for video game companies—these are usually anthropologists who study human-machine interactions by hanging out with users. My college pal Anne McClard used to do this for Apple and now does this freelance.
In the future, if Eli is both lucky & right, we may have the ethical and moral problem of having nothing to do but play games. Those who might be against Eli’s plan might argue this is a reduction of humanity to infantilism, but it could actually reinforce the most beautiful and important human behaviors.
So yes, Eli is interested in ludology, in ludic ethics, and ludic morality.
I object to most of the things Eliezer wants for the far future, but of all the sentences he has written lately, that is probably the one I object to most unequivocally. A billion galaxies devoted to fun does not leave Earth-originating intelligence at lot to devote to things that might be actually important.
That is my dyspeptic two cents.
Not wanting to be in a rotten mood keeps me from closely reading this series on fun and the earlier series on sentience or personhood, but I have detected no indication of how Eliezer would resolve a conflict between the terminal values he is describing. If for example, he learned that the will of the people, oops, I mean, the collective volition, oops, I mean, the coherent extrapolated volition does not want fun, would he reject the coherent extrapolated volition or would he resign himself to a future of severely submaximal quantities of fun?
Like WHAT, for the love of Belldandy?
Show me something more important than fun!
I think you’ve heard this one before: IMHO it has to do with the state in which reality “ends up” and has nothing to do with the subjective experiences of the intelligent agents in the reality. In my view, the greatest evil is the squandering of potential, and devoting the billion galaxies to fun is squandering the galaxies just as much as devoting them to experiments in pain and abasement is. In my view there is no important difference between the two. There would be—or rather there might be—an important difference if the fun produced by the billion galaxies is more useful than the pain and abasement—more useful, that is, for something other than having subjective experiences. But that possibility is very unlikely.
In the present day, a human having fun is probably more useful toward the kinds of ends I expect to be important than a human in pain. Actually the causal relationship between subject human experience and human effectiveness or human usefulness is poorly understood (by me) and probably quite complicated.
After the engineered explosion of engineeered intelligence, the humans are obsolete, and what replaces them is sufficiently different from the humans that my previous paragraph is irrelevant. In my view, there is no need to care whether or what subjective experiences the engineered intelligences will have.
What subjective experiences the humans will have is relevant only because the information helps us predict and control the effectiveness and the usefulness of the humans. We will have proofs of the correctness of the source code for the engineered intelligent agents, so there is no need to inquire about their subjective experiences.
Richard: You didn’t actually answer the question. You explained(erm, sort of) why you think Fun isn’t important, but you haven’t said what you think is. All you’ve done is use the word “important” as though it answered the question: “In the present day, a human having fun is probably more useful toward the kinds of ends I expect to be important than a human in pain.”. Great: what kinds of ends do you expect to be important?
Robin, my most complete description of this system of valuing things consists of this followed by this. Someone else wrote 4 books about it, the best one of which is this.
You still don’t answer the question. All those links are is an argument that if all times are treated as equal, actions now will be the same regardless of the final goal. You don’t say what goals you want to move to.
As for that book… Wow.
First sentences of Chapter 8 of that book: We are going whence we came. We are evolving toward the Moral Society, Teilhard’s Point Omega, Spinoza’s Intellectual Love of God, the Judaeo-Christian concept of union with God. Each of us is a holographic reflection of the creativity of God.
I don’t even know where to start, on either topic, so I won’t.
OK, since this is a rationalist scientist community, I should have warned you about the eccentric scientific opinions in Garcia’s book. The most valuable thing about Garcia is that he spent 30 years communicating with whoever seemed sincere about the ethical system that currently has my loyalty, so he has dozens of little tricks and insights into how actual humans tend to go wrong when thinking in this region of normative belief space.
Whether an agent’s goal is to maximize the number of novel experiences experienced by agents in the regions of space-time under its control or whether the agent’s goal is to maximize the number of gold atom in the regions under its control, the agent’s initial moves are going to be the same. Namely, your priorities are going to look some like the following. (Which item you concentrate on first is going to depend on your exact circumstances.
(1) ensure for yourself an adequate supply of things like electricity that you need to keep on functioning;
(2) get control over your own “intelligence” which probably means that if you do not yet know how reliably to re-write your own source code, you acquire that ability;
(3a) make a survey of any other optimizing processes in your vicinity;
(3b) try to determine their goals and the extent to which those goals clash with your own;
(3c) assess their ability to compete with you;
(3d) when possible, negotiate with them to avoid negative-sum mutual outcomes;
(4a) make sure that the model of reality that you started out with is accurate;
(4b) refine your model of reality to encompass more and more “distant” aspects of reality, e.g., what are the laws of physics in extreme gravity? are the laws of physics and the fundamental constants the same 10 billion light years away as they are here? -- and so on.
Because those things I just listed are necessary regardless of whether in the end you want there to be lots of gold atoms or lots of happy humans, those things have been called “universal instrumental values” or “common instrumental values”.
The goal that currently has my loyalty is very simple: everyone should pursue those common instrumental values as an end in themselves. Specifically, everyone should do their best to maximize the ability of the space, time, matter and energy under their control (1) to assure itself (“it” being the space, time, matter, etc) a reliable supply of electricity and the other things it needs; (2) to get control over its own “intelligence”; and so on.
I might have mixed my statement or definition of that goal (which I call goal system zero) with arguments as to why that goal deserves the reader’s loyalty, which might have confused you.
I know it is not completely impossible for someone to understand because Michael Vassar successfully stated goal system zero in his own words. (Vassar probably disagrees with the goal, but that is firm evidence that he understands it.)
You missed (5): preserve your goals/utility function to ensure that the resources acquired serve your goals. Avoiding transformation into Goal System Zero is a nearly universal instrumental value (none of the rest are universal either).
Do you claim that that is an argument against goal system zero? But, Carl, the same argument applies to CEV—and almost every other goal system.
It strikes me as more likely that an agent’s goal system will transform into goal system zero than it will transform into CEV. (But surely the probability of any change or transformation of terminal goal happening is extremely small in any well engineered general intelligence.)
Do you claim that that is an argument against goal system zero? If so, I guess you also believe that the fragility of the values to which Eliezer is loyal is a reason to be loyal to them. Do you? Why exactly?
I acknowledge that preserving fragile things usually has instrumental value, but if the fragile thing is a goal, I am not sure that that applies, and even if it does, I would need to be convinced that a thing’s having instrumental value is evidence I should assign it intrinsic value.
Note that the fact that goal system zero has high instrumental utility is not IMHO a good reason to assign it intrinsic utility. I have not mentioned in this comment section what most convinces me to remain loyal to goal system zero; that is not what Robin Powell asked of me. (It just so happens that the shortest and quickest explanation I know of of goal system zero involves common instrumental values.)
‘The second AI helped you more, but it constrained your destiny less.’: A very interesting sentence.
On other parts, I note that the commitment to a range of possible actions can be seen as larger-scale than to a single action, even before which one is taken is chosen.
A particular situation that comes to mind, though:
Person X does not know of person Y, but person Y knows of person X. Y has an emotional (or other) stake in a tiebreaking vote that X will make; Y cannot be present on the day to observe the vote, but sets up a simple machine to detect what vote is made and fire a projectile through the head of X if X makes one vote rather than another (nothing happening otherwise).
Let it be given that in every universe that X votes that certain way, X is immediately killed as a result. It can also safely be assumed that in those universes Y is arrested for murder.
In a certain universe, X votes the other way, but the machine is later discovered. No direct interference with X has taken place, but Y who set up the machine (pointed at X’s head, X’s continued life unknowingly dependent on X’s vote) presumably is guilty of a felony of some sort (which though, I wonder?).
Regardless of motivation, to have committed to potentially carry out a certain thing against X is treated as similarly serious to that of in fact having it carried out (or attempted to be carried out).
(This, granted, may focus on a concept within the above article without addressing the entire issue of planning another entity’s life.)
The AI is optimizing how much money you make, not how much work you do. To determine how much the AI has helped you, I think the best way to go about it is to ask counterfactually how much money you would have made if the AI weren’t there. Judging by this criterion, the first view is correct.
However, I like Eliezer’s proposal of better rules quite a bit.
Personally, I think that the “improved rules” idea is good, but sub-optimal. Beyond the removing-death bit (which removes the ridiculous, arbitrary, too-short time limit), it seems like making further modifications to reality would make the game too easy, as it were. I’m not sure how I’d feel about the idea that I was only able to steer the Future where I wanted because I was being handed an easier ruleset, it feels a bit like being stuck in a playpen. Safer and easier, maybe, but less like reality.