I don’t think any factored cognition proponents would disagree with
Composing interpretable pieces does not necessarily yield an interpretable system.
They just believe that we could, contingently, choose to compose interpretable pieces into an interpretable system. Just like we do all the time with
massive factories with billions of components, e.g. semiconductor fabs
large software projects with tens of millions of lines of code, e.g. the Linux kernel
military operations involving millions of soldiers and support personnel
Figuring out how to turn interpretability/tool-ness/alignment/corrigibility of the parts into interpretability/tool-ness/alignment/corrigibility of the whole is the central problem, and it’s a hard (and interesting) open research problem.
Agreed this is the central problem, though I would describe it more as engineering than research—the fact that we have examples of massively complicated yet interpretable systems means we collectively “know” how to solve it, and it’s mostly a matter of assembling a large enough and coordinated-enough engineering project. (The real problem with factored cognition for AI safety is not that it won’t work, but that equally-powerful uninterpretable systems might be much easier to build).
Thought-provoking post, thanks.
One important implication is that pure AI companies such as OpenAI, Anthropic, Conjecture, Cohere are likely to fall behind companies with access to large amounts of non-public-internet text data like Facebook, Google, Apple, perhaps Slack. Email and messaging are especially massive sources of “dark” data, provided they can be used legally and safely (e.g. without exposing private user information). Taking just email, something like 500 billion emails are sent daily, which is more text than any LLM has ever been trained on (admittedly with a ton of duplication and low quality content).
Another implication is that federated learning, data democratization efforts, and privacy regulations like GDPR are much more likely to be critical levers on the future of AI than previously thought.