TsviBT comments on An Open Agency Architecture for Safe Transformative AI

TsviBT 20 Dec 2022 20:02 UTC
LW: 10 AF: 6
2
AF

if you define the central problem as something like building a system that you’d be happy for humanity to defer to forever.

[I at most skimmed the post, but] IMO this is a more ambitious goal than the IMO central problem. IMO the central problem (phrased with more assumptions than strictly necessary) is more like “building system that’s gaining a bunch of understanding you don’t already have, in whatever domains are necessary for achieving some impressive real-world task, without killing you”. So I’d guess that’s supposed to happen in step 1. It’s debatable how much you have to do that to end the acute risk period, for one thing because humanity collectively is already a really slow (too slow) version of that, but it’s a different goal than deferring permanently to an autonomous agent.

(I’d also flag this kind of proposal as being at risk of playing shell games with the generator of large effects on the world, though not particularly more than other proposals in a similar genre.)
- davidad 20 Dec 2022 20:30 UTC
  LW: 6 AF: 2
  0
  AF Parent
  I’d say the scientific understanding happens in step 1, but I think that would be mostly consolidating science that’s already understood. (And some patching up potentially exploitable holes where AI can deduce that “if this is the best theory, the real dynamics must actually be like that instead”. But my intuition is that there aren’t many of these holes, and that unknown physics questions are mostly underdetermined by known data, at least for quite a long way toward the infinite-compute limit of Solomonoff induction, and possibly all the way.)
  
  Engineering understanding would happen in step 2, and I think engineering is more “the generator of large effects on the world,” the place where much-faster-than-human ingenuity is needed, rather than hoping to find new science.
  
  (Although the formalization of the model of scientific reality is important for the overall proposal—to facilitate validating that the engineering actually does what is desired—and building such a formalization would be hard for unaided humans.)