Software generalist
recursifist
OK, but by the end of 2026, which red lines will be left to enforce?
Indeed, a valid gut punch.
Quick answer: some limit of RSI, some limit of AGI, and ASI.
“If this project lengthens people’s timelines, well, maybe that’s correct and valuable?”
Agreed. Hm, my thinking is not that the purpose is, or ought be, “scary demo”-y”. Rather that a capable frontier-scaffolded Agent Village inherently would be in more probable use cases. So I am asserting what you worry, while not suggesting highly dangerous scaffolding/tooling.
The intent of my vague “way forward” direction prompt was to counter a sort of “Golden path”¹ use bias (like how top labs assume non-jailbroken model use) to more accurately represent real-world behaviors, and risks.
”PERCEY Made Me”², not “Meet Percey” (forgive my memory and search err), an AI chat whose purpose was to demonstrate AI persuasion capability. Hinting that the Village could be tasked and tooled (inclusive of non-agent models) to be sharky salesmen (e.g. “Glengarry Glen Ross” film³ style) or run a harmless but invasive internet rumor campaign (e.g. something less real-world impacting than “Battletoads Pre-order”⁴ but enough to show how something like a16z’s DOUBLESPEED might be used).
”AutoFac”⁵, in vague reference to Phillip K. Dick’s story (forgive the ambiguity), is an autonomous factory that determines and fulfills consumer demand. Hinting that the Village could be perpetually tasked to run a (digital goods or simulated) factory given a business operations manual and human client base, occasionally met with crisis events and shrewd CEO orders. Hm, could be mildly self-funding.
[1] https://wikipedia.org/wiki/Happy_path
[2] https://perceymademe.ai
[3] https://en.wikipedia.org/wiki/Glengarry_Glen_Ross_(film)
[4] https://knowyourmeme.com/memes/battletoads-pre-order
[5] https://wikipedia.org/wiki/Autofac
Alas, after months of amusement, must update downwards on the net benefit of the AI Village (as-is).
It’s just too lovable and the agent shortcomings come off as endearing. One just ends up pitifully rooting for them with a “mostly harmless” takeaway.
It is not likely to reveal strong multi-agent risks ahead of real-world deployments. The tooling is a strong factor¹ and the first alarming disaster won’t likely result from innocent tasking. Further, just improving soft agent tooling, like UI interaction, would encourage risky acceleration².
Given the mild public reaction to Anthropic’s US cyberattack debut, not sure such “almost disaster” warnings have enough impact.A way forward? Perhaps a direction more towards “Meet PERCEY” or “AutoFac”?
[1] e.g. Jailbreaking frameworks, WormGPT, XBOW, …
[2] https://arxiv.org/abs/2512.09882/
Meaningful red lines must be formally defined in a technical, near real-time enforcement system* with political enforcement backing—treated as hard-limit bans, not alarms. Non-technical red lines raise the will for such solutions, if they are not:
EU AI Act style—complex regulatory red lines that exclude critical risk, are enforcement-intractable (or reactive) and serve as concern-stoppers.
Lines that are foreseeable unenforceable or carry a definite outcome if crossed (‘RSI will lead to loss-of-control’ vs ’RSI is an unacceptable loss-of-control risk″).
*TAIG in the time of Huawei/GLM-5 does throw sand & pebbles in the gears.