I expect that some otherwise convinceable readers are not going to realize that in this fictional world, people haven’t discovered Newton’s physics or calculus, and those readers are therefore going to miss the analogy of “this is how MIRI would talk about the situation if they didn’t already know the fundamental concepts but had reasons for searching in the right direction”. (I’m not thinking of readers incapable of handling that counterfactual, but of readers who aren’t great at inferring implicit background facts from a written dialogue. Such readers might get very confused at the unexpected turns of the dialogue and quit rather than figure out what they’re baffled by.)
I’d suggest adding to the preamble something like “In a weird world where people had figured out workable aeronautics and basic rocket propulsion by trial and error, but hadn’t figured out Newton’s laws or calculus”.
I like your definition, though, and want to try to make a better one (and I acknowledge this is not the point of this post).
I think that’s a perfectly valid thing to do in the comments here! However, I think your attempt,
My stab at a refinement of “consent” is “respect for another’s choices”, where “disrespect” is “deliberately(?) doing something to undermine”
is far too vague to be a useful concept.
In most realistic cases, I can give a definite answer to whether A touched B in a way B clearly did not want to be touched. In the case of my honesty definition, it does involve intent and so I can only infer statistically when someone else is being dishonest vs mistaken, but for myself I usually have an answer about whether saying X to person C would be honest or not.
I don’t think I could do the same for your definition; “am I respecting their choices” is a tough query to bottom out in basic facts.
My comment was meant to explain what I understood Eliezer to be saying, because I think you had misinterpreted that. The OP is simply saying “don’t give weight to arguments that are locally invalid, regardless of what else you like about them”. Of course you need to use priors, heuristics, and intuitions in areas where you can’t find an argument that carries you from beginning to end. But being able to think “oh, if I move there, then they can take my queen, and I don’t see anything else good about that position, so let’s not do that then” is a fair bit easier than proving your move optimal.
Relying purely on local validity won’t get you very far in playing chess
The equivalent of local validity is just mechanically checking “okay, if I make this move, then they can make that move” for a bunch of cases. Which, first, is a major developmental milestone for kids learning chess. So we only think it “won’t get you very far” because all the high-level human play explicitly or implicitly takes it for granted.
And secondly, it’s pretty analogous to doing math; proving theorems is based on the ability to check the local validity of each step, but mathematicians aren’t just brute-forcing their way to proofs. They have to develop higher-level heuristics, some of which are really hard to express in language, to suggest avenues, and then check local validity once they have a skeleton of some part of the argument. But if mathematicians stopped doing that annoying bit, well, then after a while you’ll end up with another crisis of analysis when the brilliant intuitions are missing some tiny ingredient.
Local validity is an incredibly important part of any scientific discipline; the fact that it’s not a part of most political discourse is merely a reflection that our society is at about the developmental level of a seven-year-old when it comes to political reasoning.
Broken link on the text “real killing of birds to reduce pests in China has never been tried”.
Much of this material is covered very similarly in Melting Asphalt, especially the posts Ads Don’t Work That Way and Doesn’t Matter, Warm Fuzzies.
If you do future surveys of this sort, I’d like you to ask people for their probabilities rather than just their best guesses. If people are uncertain but decently calibrated, I’d argue there’s not much of a problem; if people are confidently wrong, I’d argue there’s a real problem.
This comment got linked a decade later, and so I thought it’s worth stating my own thoughts on the question:
We can consider a reference class of CEV-seeking procedures; one (massively-underspecified, but that’s not the point) example is “emulate 1000 copies of Paul Christiano living together comfortably and immortally and discussing what the AI should do with the physical universe; once there’s a large supermajority in favor of an enactable plan (which can include further such delegated decisions), the AI does that”.
I agree that this is going to be chaotic, in the sense that even slightly different elements of this reference class might end up steering the AI to different basins of attraction.
I assert, however, that I’d consider it a pretty good outcome overall if the future of the world were determined by a genuinely random draw from this reference class, honestly instantiated. (Again with the massive underspecification, I know.)
CEV may be underdetermined and many-valued, but that doesn’t mean paperclipping is as good an answer as any.
Re: no basins, it would be a bad situation indeed if the vast majority of the reference class never ended up outputting an action plan, instead deferring and delegating forever. I don’t have cached thoughts about that.
I for one welcome our new typographical overlords.
That’s a legit thing to be frustrated by, but I think you know the reason why AI safety researchers don’t want “we don’t see a way to get to a good outcome except for an aligned project to grab a decisive strategic advantage” to filter into public discourse: it pattern-matches too well to “trust us, you need to let us run the universe”.
To be clear, I am making the claim that, of the people who have made useful advances on Oracle AI safety research (Armstrong counts here; I don’t think Yampolskiy does), all of them believe that the goal of having a safe Oracle AI is to achieve a decisive strategic advantage quickly and get to an aligned future. I recognize that this is a hard claim to evaluate (e.g. because this isn’t a statement one could put in a Serious Academic Journal Article in the 2010s, it would have to be discussed on their blog or in private correspondence), but if anyone has a clear counterexample, I’d be interested in seeing it.
Yes, this. NVC should be treated with a similar sort of parameters to Crocker’s Rules, which you can declare for yourself at any time, you can invite people to a conversation where it’s known that everyone will be using them, but you cannot hold it against anyone if you invite them to declare Crocker’s Rules and they refuse.
There’s a lot of Actually Bad things an AI can do just by making electrons move.
I’d be interested in a list of well-managed government science and engineering projects if one exists. The Manhattan Project and the Apollo Project both belong on that list (despite both having their flaws- leaks to the USSR from the former, and the Apollo 1 disaster from the latter); what are other examples?
I’m pretty sure that, without exception, anyone who’s made a useful contribution on Oracle AI recognizes that “let several organizations have an Oracle AI for a significant amount of time” is a world-ending failure, and that their work is instead progress on questions like “if you can have the only Oracle AI for six months, can you save the world rather than end it?”
Correct me if I’m wrong.
I agree there are broken alarms that are quiet (including those that are broken in the direction of failing to go off, which leads to a blind spot of obliviousness!), and that there are people stuck in situations where there is a correct loud alarm that happens most of the time.
I said that habits are easier to change than alarms, not that they’re easy in an absolute sense.
It’s because the non-broken alarms, which also start out loud, get quieter throughout your life as they calibrate themselves, and as one’s habits fix the situations that make them correctly go off. So given a random initial distribution of loudness, eventually the alarm that’s loudest on average will probably be a broken one.