Rohin Shah comments on [AN #69] Stuart Russell’s new book on why we need to replace the standard model of AI

Rohin Shah 19 Oct 2019 18:02 UTC
LW: 17 AF: 8
AF
I enjoyed pages 185-190, on mathematical guarantees, especially because I’ve been confused about what the “provably beneficial” in CHAI’s mission statement is meant to say. Some quotes:
On the other hand, if you want to prove something about the real world—for example, that AI systems designed like so won’t kill you on purpose—your axioms have to be true in the real world. If they aren’t true, you’ve proved something about an imaginary world.
On the applicability of theorems to practice:
The trick is to know how far one can stray from the real world and still obtain useful results. For example, if the rigid-beam assumption allows an engineer to calculate the forces in a structure that includes the beam, and those forces are small enough to bend a real steel beam by only a tiny amount, then the engineer can be reasonably confident that the analysis will transfer from the imaginary world to the real world.
as well as
The process of removing unrealistic assumptions continues until the engineer is fairly confident that the remaining assumptions are true enough in the real world. After that, the engineered system can be tested in the real world; but the test results are just that. They do not prove that the same system will work in other circumstances or that other instances of the system will behave the same way as the original.
It then talks about assumption failure in cryptography due to side-channel attacks.
A somewhat more concrete version of what “provably beneficial” might mean:
Let’s look at the kind of theorem we would like eventually to prove about machines that are beneficial to humans. One type might go something like this:
Suppose a machine has components $A$ , $B$ , $C$ , connected to each other like so and to the environment like so, with internal learning algorithms $l_{A}$ , $l_{B}$ , $l_{C}$ that optimize internal feedback rewards $r_{A}$ , $r_{B}$ , $r_{C}$ defined like so, and [a few more conditions] . . . then, with very high probability, the machine’s behavior will be very close in value (for humans) to the best possible behavior realizable on any machine with the same computational and physical capabilities.
The main point here is that such a theorem should hold regardless of how smart the components become—that is, the vessel never springs a leak and the machine always remains beneficial to humans.
There are three other points worth making about this kind of theorem. First, we cannot try to prove that the machine produces optimal (or even near-optimal) behavior on our behalf, because that’s almost certainly computationally impossible. [...] Second, we say “very high probability . . . very close” because that’s typically the best that can be done with machines that learn. [...] Finally, we are a long way from being able to prove any such theorem for really intelligent machines operating in the real world!
It then goes on to discuss how such a theorem is subject to “side-channel attacks” because such theorems typically assume Cartesian duality, which is not actually true (see Embedded Agency).
What links here?
- [AN #152]: How we’ve overestimated few-shot learning capabilities by Rohin Shah (16 Jun 2021 17:20 UTC; 22 points)
- Rohin Shah's comment on High Impact Careers in Formal Verification: Artificial Intelligence by quinn (EA Forum; 15 Jun 2021 17:02 UTC; 4 points)