Zac Hatfield-Dodds comments on A Deceptively Simple Argument in favor of Problem Factorization

Zac Hatfield-Dodds 6 Aug 2022 21:45 UTC
1 point
0
- Sure, suppose that the alignment problem is in the set of problems that a Bureaucracy Of AIs can solve. This sounds helpful because you’ve ~defined said bureaucracy to be safe, but I doubt it’s possible to build a safe bureaucracy out of unsafe parts—and if it is, we don’t know how to do so!
- I dislike the fatalism here, and would rather celebrate direct attacks on the problem even when they don’t work. For example, I’d love to see a more detailed writeup on BoAI proposals across a range of scenarios and safety assumptions :-)
- Vladimir_Nesov 7 Aug 2022 3:46 UTC
  3 points
  0
  Parent
  
  I doubt it’s possible to build a safe bureaucracy out of unsafe parts
  
  The intended construction is to build a safer bureaucracy out of less safe parts/agents (or just less robustly safe ones). So they shouldn’t break in most cases of running the bureaucracy, and the bureaucracy as a whole should break even less frequently. If the distillation of such a bureaucracy gives a safer part/agent than the original part, that is an iterative improvement. This doesn’t need to change the game in one step, only improve the situation with each step, in a direction that is hard to formulate without resorting to the device of a bureaucracy. Otherwise this could be done with the more lightweight prompt/tuning setup, where the bureaucracy is just the prompt given to a single part/agent.