gwern comments on RAISE AI Safety prerequisites map entirely in one post

gwern 18 Jul 2019 0:36 UTC
19 points
As a historical fact, you certainly can invent selective breeding without knowing anything we would consider true: consider Robert Bakewell and the wildly wrong theories of heredity current when he invented line breeding and thus demonstrated that breeds could be created by artificial selection. (It’s unclear what Bakewell and/or his father thought genetics was, but at least in practice, he seems to have acted similarly to modern breeding practices in selecting equally on mothers/fathers, taking careful measurements and taking into account offspring performance, preserving samples for long-term comparison, and improving the environment as much as possible to allow maximum potential to be reached.) More broadly, humans had no idea what they were doing when they were domesticated everything; if Richard Dawkins is to be trusted, it seems that the folk genetics belief was that traits are not inherited and everything regressed to an environmental mean, and so one might as well eat one’s best plants/animals since it’ll make no difference. And even more broadly, evolution has no idea what ‘it’ is doing for anything, of course.

The problem is, as Eliezer always pointed out, that selection is extremely slow and inefficient compared to design—the stupidest possible optimization process that’ll still work within the lifetime of Earth—and comes with zero guarantees of any kind. Genetic drift might push harmful variants up, environmental fluctuations might extinguish lineages, reproductively fit changes which Goodhart the fitness function might spread, nothing stops a ‘treacherous turn’, evolved systems tend to have minimal modularity and are incomprehensible, evolution will tend to build in instrumental drives which are extremely dangerous if there is any alignment problem (which there will be), sexual selection can drive a species extinct, evolved replicators can be hijacked by replicators on higher levels like memetics, any effective AGI design process will need to learn inner optimizers/mesa-optimizers which will themselves be unpredictable and only weakly constrained by selection, and so on. If there’s one thing that evolutionary computing teaches, it’s that these are treacherous little buggers indeed (Lehman et al 2018). The optimization process gives you what you ask for, not what you wanted.

So, you probably can ‘evolve’ an AGI, given sufficient computing power. Indeed, considering how many things in DL or DRL right now take the form of ‘we tried a whole bunch of things and X is what worked’ (note that a lot of papers are misleading about how many things they tried, and tell little theoretical stories about why their final X worked, which are purely post hoc) and only much later do any theoreticians manage to explain why it (might) work, arguably that’s how AI is proceeding right now. Things like doing population-based training for AlphaStar or NAS to invent EfficientNet are just conceding the obvious and replacing ‘grad student descent’ with gradient descent.

The problem is, we won’t understand why they work, won’t have any guarantees that they will be Friendly, and they almost certainly will have serious blindspots/flaws (like adversarial examples or AlphaGo’s ‘delusions’ or how OA5/AlphaStar fell apart when they began losing despite playing apparently at pro level before). NNs don’t know what they don’t know, and neither do we.

Nor are these flaws easy to fix with just some more tinkering. Much like computer security, you can’t simply patch your way around all the problems with software written in C (as several decades of endless CVEs has taught us); you need to throw it out and start with formal methods to make errors like buffer overflows impossible. Adversarial examples, for instance: I recall that one conference had something like 5 adversarial defenses, all defined heuristically without proof of efficacy, and all of them were broken between the time of submission and the actual conference. Or AlphaGo’s delusions couldn’t be fixed despite quite elaborate methods being used to produce Master (which at least had better ELO) until they switched to the rather different architecture of AlphaZero. Neither OA5 nor AlphaStar has been convincingly fixed that I know of, they simply got better to the point where human players couldn’t exploit them without a lot of practice to find reproducible ways of triggering blindspots.

So, that’s why you want all the math. So you can come up with provably Friendly architectures without hidden flaws which simply haven’t been triggered yet.
What links here?
- Clarifying some key hypotheses in AI alignment by Ben Cottier (15 Aug 2019 21:29 UTC; 79 points)
- moridinamael 18 Jul 2019 6:17 UTC
  7 points
  Parent
  To be clear, I didn’t mean to say that I think AGI should be evolved. The analogy to breeding was merely to point out that you can notice a basically correct trick for manipulating a complex system without being able to prove that the trick works a priori and without understanding the mechanism by which it works. You notice the regularity on the level of pure conceptual thought, something closer to philosophy than math. Then you prove it afterward. As far as I’m aware, this is indeed how most truly novel discoveries are made.
  You’ve forced me to consider, though, that if you know all the math, you’re probably going to be much better and faster at spotting those hidden flaws. It may not take great mathematical knowledge to come up with a new and useful insight, but it may indeed require math knowledge to prove that the insight is correct, or to prove that it only applies in some specific cases, or to show that, hey, it wasn’t actually that great after all.
- Pattern 18 Jul 2019 5:44 UTC
  4 points
  Parent
  The problem is, we won’t understand why they work, won’t have any guarantees that they will be Friendly, and they almost certainly will have serious blindspots/flaws (like adversarial examples or AlphaGo’s ‘delusions’ or how OA5/AlphaStar fell apart when they began losing despite playing apparently at pro level before). NNs don’t know what they don’t know, and neither do we.
  I hadn’t heard about that. I suppose that’s what happens when you don’t watch all the videos of their play.