As I understand EY’s point, it’s that (a) the safety provided by any combination of defenses A, B, C, etc. around an unboundedly self-optimizing system with poorly architected goals will be less than the safety provided by such a system with well architected goals, and that (b) the safety provided by any combination of defenses A, B, C, etc. around such a system with poorly architected goals is too low to justify constructing such a system, but that (c) the safety provided by such a system with well architected goals is high enough to justify constructing such a system.
That the safety provided by a combination of defenses A, B, C is greater than that provided by A alone is certainly true, but seems entirely beside his point.
(For my own part, a and b seem pretty plausible to me, though I’m convinced of neither c nor that we can construct such a system in the first place.)
As I understand EY’s point, it’s that (a) the safety provided by any combination of defenses A, B, C, etc. around an unboundedly self-optimizing system with poorly architected goals will be less than the safety provided by such a system with well architected goals, and that (b) the safety provided by any combination of defenses A, B, C, etc. around such a system with poorly architected goals is too low to justify constructing such a system, but that (c) the safety provided by such a system with well architected goals is high enough to justify constructing such a system.
That the safety provided by a combination of defenses A, B, C is greater than that provided by A alone is certainly true, but seems entirely beside his point.
(For my own part, a and b seem pretty plausible to me, though I’m convinced of neither c nor that we can construct such a system in the first place.)