Formatting request: can the footnote numbers be augmented with links that jump to the footnote text? (I presume this worked in Arbital but broke when it was moved here.)
I had a notion here that I could stochastically introduce a new goal that would minimize total suffering over an agent’s life-history. I tried this, and the most stable solution turned out to be thus: introduce an overwhelmingly aversive goal that causes the agent to run far away from all of its other goals screaming.
did you mean: anhedonia
(No, seriously, your paragraph is an apt description of a long bout I had of depression-induced anhedonia; I felt so averse to every action that I ceased to feel wants, and I consistently marked my mood as neutral rather than negative despite being objectively more severely depressed than I was at other times when I put negative numbers in my mood tracker.)
The ideal thing is to judge Bob as if he were making the same prediction every day until he makes a new one, and log-score all of them when the event is revealed. (That is, if Bob says 75% on January 1st and 60% on February 1st, and then on March 1st the event is revealed to have happened, Bob’s score equals 31*log(.25) + 28*log(.4). Then Bob’s best strategy is to update his prediction to his actual current estimate as often as possible; past predictions are sunk costs.
The real-world version is remembering to dock people’s bad predictions more, the longer they persisted in them. But of course this is hard.
538 did do this with their self-evaluation, which is a good way to try and establish a norm in the domain of model-driven reporting.
Let’s note differences of degree here. Political systems differ massively in how easily decisionmakers can claim large spoils for themselves, and these differences seem to correlate with how pro-social the decisions tend to be. In particular, the dollar amounts of graft being alleged for politicians in liberal democracies are usually small compared to what despots regularly claim without consequence. (Which is not to say that it would be wise to ignore corruption in liberal democracies!)
I’m not convinced that Europe had more intellectual freedom on average than China, but because of the patchwork of principalities, it certainly had more variation in intellectual freedom than did a China that was at any given time either mostly unified or mostly at war; and all that you need for an intellectual revolution is the existence of a bastion of intellectual freedom somewhere.
It doesn’t move much probability mass to the very near term (i.e. 1 year or less), because both this and AlphaStar aren’t really doing consequentialist reasoning, they’re just able to get a surprising performance with simpler tricks (the very Markovian nature of human writing, a good position evaluation function) given a whole lot of compute.
However, it does shift my probabilities forward in time, in the sense that one new weird trick to do deductive or consequentialist reasoning, plus a lot of compute, might get you there really quickly.
I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven’t discovered Newton’s physics or calculus, and those readers are therefore going to miss the analogy of “this is how MIRI would talk about the situation if they didn’t already know the fundamental concepts but had reasons for searching in the right direction”. (I’m not thinking of readers incapable of handling that counterfactual, but of readers who aren’t great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they’re baffled by.)
I’d suggest adding to the preamble something like “In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn’t figured out Newton’s laws or calculus”.
I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post).
I think that’s a perfectly valid thing to do in the comments here! However, I think your attempt,
My stab at a refinement of “consent” is “respect for another’s choices”, where “disrespect” is “deliberately(?) doing something to undermine”
is far too vague to be a useful concept.
In most realistic cases, I can give a definite answer to whether A touched B in a way B clearly did not want to be touched. In the case of my honesty definition, it does involve intent and so I can only infer statistically when someone else is being dishonest vs mistaken, but for myself I usually have an answer about whether saying X to person C would be honest or not.
I don’t think I could do the same for your definition; “am I respecting their choices” is a tough query to bottom out in basic facts.
My comment was meant to explain what I understood Eliezer to be saying, because I think you had misinterpreted that. The OP is simply saying “don’t give weight to arguments that are locally invalid, regardless of what else you like about them”. Of course you need to use priors, heuristics, and intuitions in areas where you can’t find an argument that carries you from beginning to end. But being able to think “oh, if I move there, then they can take my queen, and I don’t see anything else good about that position, so let’s not do that then” is a fair bit easier than proving your move optimal.
Relying purely on local validity won’t get you very far in playing chess
The equivalent of local validity is just mechanically checking “okay, if I make this move, then they can make that move” for a bunch of cases. Which, first, is a major developmental milestone for kids learning chess. So we only think it “won’t get you very far” because all the high-level human play explicitly or implicitly takes it for granted.
And secondly, it’s pretty analogous to doing math; proving theorems is based on the ability to check the local validity of each step, but mathematicians aren’t just brute-forcing their way to proofs. They have to develop higher-level heuristics, some of which are really hard to express in language, to suggest avenues, and then check local validity once they have a skeleton of some part of the argument. But if mathematicians stopped doing that annoying bit, well, then after a while you’ll end up with another crisis of analysis when the brilliant intuitions are missing some tiny ingredient.
Local validity is an incredibly important part of any scientific discipline; the fact that it’s not a part of most political discourse is merely a reflection that our society is at about the developmental level of a seven-year-old when it comes to political reasoning.
Broken link on the text “real killing of birds to reduce pests in China has never been tried”.
Much of this material is covered very similarly in Melting Asphalt, especially the posts Ads Don’t Work That Way and Doesn’t Matter, Warm Fuzzies.
If you do future surveys of this sort, I’d like you to ask people for their probabilities rather than just their best guesses. If people are uncertain but decently calibrated, I’d argue there’s not much of a problem; if people are confidently wrong, I’d argue there’s a real problem.
This comment got linked a decade later, and so I thought it’s worth stating my own thoughts on the question:
We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that’s not the point) example is “emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there’s a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that”.
I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.
I assert, however, that I’d consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)
CEV may be underdetermined and many-valued, but that doesn’t mean paperclipping is as good an answer as any.
Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don’t have cached thoughts about that.
I for one welcome our new typographical overlords.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
Yes, this. NVC should be treated with a similar sort of parameters to Crocker’s Rules, which you can declare for yourself at any time, you can invite people to a conversation where it’s known that everyone will be using them, but you cannot hold it against anyone if you invite them to declare Crocker’s Rules and they refuse.
There’s a lot of Actually Bad things an AI can do just by making electrons move.