Chaos in complex systems is guaranteed but also bounded. I cannot know what the weather will be like in New York City one month from now. I can, however, predict that it probably won’t be “tornado” and near-certainly won’t be “five hundred simultaneous tornadoes level the city”. We know it’s possible to build buildings that can withstand ~all possible weather for a very long time. I imagine that a thing you’re calling a puppet-master could build systems that operate within predictable bounds robustly and reliably enough to more or less guarantee broad control.
Caveat: The transition from seed AI to global puppet-master is harder to predict than the end state. It might plausibly involve psychohistorian-like nudges informed by superhuman reasoning and modeling skills. But I’d still expect that the optimization pressure a superintelligence brings to bear could render the final outcome of the transition grossly overdetermined.
Anthropic is indeed trying. Unfortunately, they are not succeeding, and they don’t appear to be on track to notice this fact and actually stop.
If Anthropic does not keep up with the reckless scaling of e.g. OpenAI, they will likely cease to attract investment and wither on the vine. But aligning superintelligence is harder than building it. A handful of alignment researchers working alongside capabilities folks aren’t going to cut it. Anthropic cannot afford to delay scaling; even if their alignment researchers advised against training the next model, Anthropic could not afford to heed them for long.
I’m primarily talking about the margin when I advise folks not to go work at Anthropic, but even if the company had literally zero dedicated alignment researchers, I question the claim that the capabilities folks would be unable to integrate publicly available alignment research. If they had a Manual of Flawless Alignment produced by diligent outsiders, they could probably use it. (Though even then, we would not be safe, since some labs would inevitably cut corners.)
I think the collective efforts of humanity can produce such a Manual given time. But in the absence of such a Manual, scaling is suicide. If Anthropic builds superintelligence at approximately the same velocity as everyone else while trying really really hard to align it, everyone dies anyway.