The main way complexity of this sort would be addressable is if the intellectual artifact that you tried to prove things about were simpler than the process that you meant the artifact to unfold into. For example, the mathematical specification of AIXI is pretty simple, even though the hypotheses that AIXI would (in principle) invent upon exposure to any given environment would mostly be complex. Or for a more concrete example, the Gallina kernel of the Coq proof engine is small and was verified to be correct using other proof tools, while most of the complexity of Coq is in built-up layers of proof search strategies which don’t need to themselves be verified, as the proofs they generate are checked by Gallina.
Isn’t that as unbelievable as the idea that you can prove that a particular zygote will never grow up to be an evil dictator? Surely this violates some principles of complexity, chaos [...]
Yes, any physical system could be subverted with a sufficiently unfavorable environment. You wouldn’t want to prove perfection. The thing you would want to prove would be more along the lines of, “will this system become at least somewhere around as capable of recovering from any disturbances, and of going on to achieve a good result, as it would be if its designers had thought specifically about what to do in case of each possible disturbance?”. (Ideally, this category of “designers” would also sort of bleed over in a principled way into the category of “moral constituency”, as in CEV.) Which, in turn, would require a proof of something along the lines of “the process is highly likely to make it to the point where it knows enough about its designers to be able to mostly duplicate their hypothetical reasoning about what it should do, without anything going terribly wrong”.
We don’t know what an appropriate formalization of something like that would look like. But there is reason for considerable hope that such a formalization could be found, and that this formalization would be sufficiently simple that an implementation of it could be checked. This is because a few other aspects of decision-making which were previously mysterious, and which could only be discussed qualitatively, have had powerful and simple core mathematical descriptions discovered for cases where simplifying modeling assumptions perfectly apply. Shannon information was discovered for the informal notion of surprise (with the assumption of independent identically distributed symbols from a known distribution). Bayesian decision theory was discovered for the informal notion of rationality (with assumptions like perfect deliberation and side-effect-free cognition). And Solomonoff induction was discovered for the informal notion of Occam’s razor (with assumptions like a halting oracle and a taken-for-granted choice of universal machine). These simple conceptual cores can then be used to motivate and evaluate less-simple approximations for situations where where the assumptions about the decision-maker don’t perfectly apply. For the AI safety problem, the informal notions (for which the mathematical core descriptions would need to be discovered) would be a bit more complex—like the “how to figure out what my designers would want to do in this case” idea above. Also, you’d have to formalize something like our informal notion of how to generate and evaluate approximations, because approximations are more complex than the ideals they approximate, and you wouldn’t want to need to directly verify the safety of any more approximations than you had to. (But note that, for reasons related to Rice’s theorem, you can’t (and therefore shouldn’t want to) lay down universally perfect rules for approximation in any finite system.)
Two other related points are discussed in this presentation: the idea that a digital computer is a nearly deterministic environment, which makes safety engineering easier for the stages before the AI is trying to influence the environment outside the computer, and the idea that you can design an AI in such a way that you can tell what goal it will at least try to achieve even if you don’t know what it will do to achieve that goal. Presumably, the better your formal understanding of what it would mean to “at least try to achieve a goal”, the better you would be at spotting and designing to handle situations that might make a given AI start trying to do something else.
(Also: Can you offer some feedback as to what features of the site would have helped you sooner be aware that there were arguments behind the positions that you felt were being asserted blindly in a vacuum? The “things can be surprisingly formalizable, here are some examples” argument can be found in lukeprog’s “Open Problems Related to the Singularity” draft and the later “So You Want to Save the World”, though the argument is very short and hard to recognize the significance of if you don’t already know most of the mathematical formalisms mentioned. A backup “you shouldn’t just assume that there’s no way to make this work” argument is in “Artificial Intelligence as a Positive and Negative Factor in Global Risk”, pp 12-13.)
what will prevent them from becoming “bad guys” when they wield this much power
That’s a problem where successful/practically applicable formalizations are harder to hope for, so it’s been harder for people to find things to say about it that pass the threshold of being plausible conceptual progress instead of being noisy verbal flailing. See the related “How can we ensure that a Friendly AI team will be sane enough?”. But it’s not like people aren’t thinking about the problem.
I think peoples’ decision about whether to accept or resist the AGW proposition is being complicated by an implicit negotiation over political power that’s inevitably attached to that decision.
Because the scientific projections are still vague, people feel as if their decision about whether to believe in AGW is underdetermined by the evidence, in such a way that political actors in the future will feel entitled to retrospectively interpret their decision for purposes of political precedent. (“Were they forced by the evidence, or did they feel weak enough that they made a concession they didn’t have to make?”) And the precedent won’t be induced in terms of the mental states that a perfect decision theorist, thinking about the AGW mitigation decision problem, would have had. The precedent will be in terms of the mental states that a normal non-scientifically-trained (but politically active) human would have had. One of those mental states would be uncertainty about whether scientists (unconsciously intuited as potentially colluding with, and/or hoping to become, power-grubbing environmental regulators) are just making AGW up. In that context, agreeing that AGW is probably real feels like ceding one’s right of objection to whatever seizures of power someone’s found some vague scientific way of justifying.
It becomes a signaling game, in which each choice of belief will be understood as exactly how you would communicate a particular choice of political move, and the costs of making the wrong political move feel very high. So the belief decisions and the political actions become tangled up.
Roughly, people have no way of saying:
So instead, they say:
If it were possible to negotiate separately about AGW action and about precedents of policy concessions to e.g. scientists’ claims, then you might see less decision-theoretic insanity around the AGW action question itself.
(Note—most of this analysis is not on the basis of such data as opinion polls or controlled studies. It’s just from introspecting on my experience of attempting to empathize with the state of mind of AGW disputants, as recalled mostly from Internet forums.)