Meta: I approve of the practice of arguing against your own post in a comment.
How much less do you expect this to happen under the current system?
both historically and now, criticism is often met with counterarguments based on “style” rather than engaging with the technical meat of the criticism
Is there any group of people who reliably don’t do this? Is there any indication that AI researchers do this more often than others?
Eliezer’s real answer to this question is discussed in Timeless Control. Basically, choice is still meaningful in many-worlds or any other physically deterministic universe. There are incredibly few Everett branches starting from here where tomorrow I go burn down an orphanage, and this is genuinely caused by the fact that I robustly do not want to do that sort of thing.
If you have altruistic motivation, then the Everett branches starting from here are in fact better (in expectation) than the branches starting from a similar universe with a version of you that has no altruistic motivation. By working to do good, you are in a meaningful sense causing the multiverse to contain a higher proportion of good worlds than it otherwise would.
It really does all add up to normality, even if it feels counterintuitive.
Well, this post aged interestingly for those of us who know the author (who ended up working for a high-profile EA organization for some time).
it is not the done the thing
it is not the done thing, perhaps?
Maybe you can get the best of both worlds by imagining you’re writing a children’s book, but that your editor is in fact an expert on the subject and you don’t want to embarrass yourself in front of them.
And Robin Hanson was surprised that no big corporation wanted to implement a real prediction market?
This strongly resembles the argument given by Subhan in EY’s post Is Morality Preference?, with a side order of Fake Selfishness. You might enjoy reading those posts along with others in their respective sequences. (“Is Morality Preference?” was part of the original metaethics sequence but didn’t make the cut for Rationality: AI to Zombies.)
More to the point, the biggest mistake I see here is the one addressed in The Domain of Your Utility Function: yes, my moral preferences are a part of my map rather than the territory, but there’s still a damn meaningful difference between egoism (preferences that point only to the part of my map labeled “my future experiences”) and my actual moral preferences, which point to many other parts of the map as well.
I am struggling to understand the goal of the post.
The title was helpful to me in that regard. Each of these examples shows an agent who could run an honest process to get evidence on a question, but which prefers one answer so much that they try to stack the deck in that direction, and thereby loses the hoped-for benefits of that process.
Getting an honest Yes requires running the risk of getting a No instead.
Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)
I had a notion here that I could stochastically introduce a new goal that would minimize total suffering over an agent’s life-history. I tried this, and the most stable solution turned out to be thus: introduce an overwhelmingly aversive goal that causes the agent to run far away from all of its other goals screaming.
did you mean: anhedonia
(No, seriously, your paragraph is an apt description of a long bout I had of depression-induced anhedonia; I felt so averse to every action that I ceased to feel wants, and I consistently marked my mood as neutral rather than negative despite being objectively more severely depressed than I was at other times when I put negative numbers in my mood tracker.)
The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob’s score equals 31*log(.25) + 28*log(.4). Then Bob’s best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.
The real-world version is remembering to dock people’s bad predictions more, the longer they persisted in them. But of course this is hard.
538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.
Let’s note differences of degree here. Political systems differ massively in how easily decisionmakers can claim large spoils for themselves, and these differences seem to correlate with how pro-social the decisions tend to be. In particular, the dollar amounts of graft being alleged for politicians in liberal democracies are usually small compared to what despots regularly claim without consequence. (Which is not to say that it would be wise to ignore corruption in liberal democracies!)
I’m not convinced that Europe had more intellectual freedom on average than China, but because of the patchwork of principalities, it certainly had more variation in intellectual freedom than did a China that was at any given time either mostly unified or mostly at war; and all that you need for an intellectual revolution is the existence of a bastion of intellectual freedom somewhere.
It doesn’t move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren’t really doing consequentialist reasoning, they’re just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.
However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.
I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven’t discovered Newton’s physics or calculus, and those readers are therefore going to miss the analogy of “this is how MIRI would talk about the situation if they didn’t already know the fundamental concepts but had reasons for searching in the right direction”. (I’m not thinking of readers incapable of handling that counterfactual, but of readers who aren’t great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they’re baffled by.)
I’d suggest adding to the preamble something like “In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn’t figured out Newton’s laws or calculus”.
I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post).
I think that’s a perfectly valid thing to do in the comments here! However, I think your attempt,
My stab at a refinement of “consent” is “respect for another’s choices”, where “disrespect” is “deliberately(?) doing something to undermine”
is far too vague to be a useful concept.
In most realistic cases, I can give a definite answer to whether A touched B in a way B clearly did not want to be touched. In the case of my honesty definition, it does involve intent and so I can only infer statistically when someone else is being dishonest vs mistaken, but for myself I usually have an answer about whether saying X to person C would be honest or not.
I don’t think I could do the same for your definition; “am I respecting their choices” is a tough query to bottom out in basic facts.
My comment was meant to explain what I understood Eliezer to be saying, because I think you had misinterpreted that. The OP is simply saying “don’t give weight to arguments that are locally invalid, regardless of what else you like about them”. Of course you need to use priors, heuristics, and intuitions in areas where you can’t find an argument that carries you from beginning to end. But being able to think “oh, if I move there, then they can take my queen, and I don’t see anything else good about that position, so let’s not do that then” is a fair bit easier than proving your move optimal.