SE Gyges comments on SE Gyges’ response to AI-2027

SE Gyges 18 Aug 2025 7:40 UTC
10 points
4
What would be a major disagreement, then? Something like a medium scenario or slopworld?
Possibly, but in my own words, on technical questions, purely? That an LLM is completely the wrong paradigm. That any reasonable timeline runs 10+ years. China is inevitably going to get there first. China is unimportant and should be ignored. GPUs are not the most important resource or determinative. That the likely pace of future progress is unknowable.
Substantive policy options, which is more what I had in mind:
1) For-profit companies (and/or OAI specifically) have inherently bad incentives incompatible with suitably cautious development in this space.
2) That questions of who has the most direct governance and control of the actual technology are of high importance, and so safety work is necessarily about trustworthy control and/or ownership of the parent organization.
3) Arms races for actual armaments are bad incentives and should be avoided at all costs. This can be mitigated by prohibiting arms contracts, nationalizing the companies, forbearing from development at all, or requiring an international agreement & doing development under a consortium.
4) That safety work is not sufficiently advanced to meaningfully proceed
5) That there needs to be a much more strictly defined and enforced criteria for cutoff or safety certifying a launch.
Any of the technical issues kneecaps the parts of this that dovetail with being a business plan. Any of these (pretty extreme) policy remedies harms OAI substantially, and they are incentivized to find reasons why they can claim that they are very bad ideas.
Follows various bits about China, which I am going to avoid quoting because I have basically exactly one disagreement with it that does not respond to any given point:
The correct move in this game is to not play. There is no arms race with China, either against their individual companies or against China itself, that produces incentives which are anything other than awful. (Domestic arms races are also not great, but at least do not co-opt the state completely in the same way.) Taking an arms race as a given is choosing to lose. It should not, and really, must not be very important what country anything happens in.
This creates a coordination problem. These are notoriously difficult, but sometimes problems are actually hard and there is no non-hard solution. Bluntly, however, from my perspective, the US sort of unilaterally declared an arms race. Arms race prophecies tend to be self-fulfilling. People should stop making them.
My argument for, basically, the damnation by financial incentive of this entire China-themed narrative runs basically as follows, with each being crux-y:
1) People follow financial incentives deliberately, such as by lying or by selectively seeking out information that might convince someone to give them money.
2) This is not always visible, because all of the information can be true; you can do this without ever lying. You can simply not try hard to disprove the thesis that you are pushing for.
3) People who are not following this financial incentive at all can, especially if the incentive is large, be working on extremely biased information regardless of whether they personally are aware of a financial incentive of any kind. Information towards a conclusion is available, and against it is not available, because of how other people have behaved.
4) OpenAI has such an incentive, and specifically seems to prefer to have an arms-race narrative because it justifies government funding and lack of regulation. (e.g., this op ed by sam altman)
5) The information environment caused by this ultimately causes the piece to have this overarching China arms race theme, and it is therefore not a coincidence that it is received by US Government stakeholders as actually arguing against regulation of any kind.
I think that this specifically being the ultimate cause of the very specific arms race narrative now popular and displayed here is parsimonious. It does not, I think, assume any very difficult facts, and explains e.g. how AI 2027 manages to accomplish the exact opposite of its apparently intended effect with major stakeholders.
[quoting original author] in our humble opinion, AI 2027 depicts an incompetent government being puppeted/captured by corporate lobbyists. It does not depict what we think a competent government would do. We are working on a new scenario branch that will depict competent government action.
I would read this.
We don’t actually have any tools aside from benchmarks to estimate how useful the models are. We are fortunate to watch the AIs slow the devs down. But what if capable AIs do appear?
Hoping that benchmarks measure the thing you want to measure is the streetlight effect. Sometimes you just have to walk into the dark.
So your take has OpenBrain sell the most powerful models directly to the public. That’s a crux. In addition, granting Agents-1-4 instead of their minified versions direct access to the public causes Intelligence Curse-like disruption faster and attracts more government attention to powerful AIs.
I am actually not sure this requires selling the most powerful models, although I hadn’t considered this.
If there’s a -mini or similar it leaks information from a teacher model, if it had one; it is possible to skim off the final layer of the model by clever sampling, or to distill out nearly the entire distribution if you sample it enough. I do not think you can be confident that it is not leaking the capabilities you don’t want to sell, if those capabilities are extremely dangerous.
So: If you think the most powerful models are a serious bioweapons risk you should keep them airgapped, which means you also cannot use them in developing your cheaper models. You gain literally nothing in terms of a safely sell-able external-facing product.
So you want to reduce p(doom) by reducing p(ASI is created). Alas, there are many companies trying their hand at creating the ASI. Some of them are in China, which requires international coordination. One of the companies in the USA produced MechaHitler, which could imply that Musk is so reckless that he deserves having the compute confiscated.
This is about right. I do not think P(ASI is created) is very high currently. My P(someone figures out alignment tolerably) is probably in the same ballpark. I am also relatively sanguine about this because I do not think existing projects are as promising as their owners do, which means we have time.
That’s what the AI-2027 forecast is about. Alas, it was likely misunderstood...
I think the fact that tests for selling the model and tests for actual danger from the model are considered the same domain is basically an artifact of the business process, and should not be.
The scalar in question is the acceleration of the research speed with the AI’s help vs. without the help. It’s indeed hard to predict, but it is the most important issue.
A crux here: I do not think most things of interest are differentiable curves. Differentiable curves can be modeled usefully. Therefore, people like to assume things are differentiable curves.
If one is very concerned with being correct, something being a differentiable curve is a heavy assumption and needs to be justified.
From a far-off view, starting with Moore’s Law, transhumanism (as was the style at the time) has made a point of finding some differentiable curve and extending it. This works pretty well for some things, like Kurzweil on anything that is a function of transistor count, and horribly elsewhere, like Kurzweil on anything that is not a function of transistor count.
Some things in AI look kind of Moore’s-law-ish, but it does not seem well-supported that they actually are.
This is likely a crux. What the AI-2027 scenario requires is that AI agents who do automate R&D are uninterpretable and misaligned.
Yes.
If a corporation plans to achieve world domination and creates a misalinged AI, then we DON’T end up in a position better than if the corp aligned the AI to itself. In addition, the USG might have nationalised OpenBrain by that point, since the authors promise to create a branch where the USG is[5] way more competent than in the original scenario. [6]
Added note to explain concern: What type of AI is created is path-dependent. Generically, hegemonic entities make stupid decisions. They would e.g. probably prefer if everyone shut up about them not doing whatever they want to do. Paths that lead through these scenarios are less likely to produce good outcomes, AI-wise.
This is the evidence of a semi-success which could be actually worse than a failure.
Yes. I hate it, actually.
DeepSeek outperformed Llama because of an advanced architecture proposed by humans. The AI-2027 forecast has the AIs come up with architectures and try them. If the AIs do reach such a capability level, then more compute = more automatic researchers, experiments, etc = more results.
This is cogent. If beyond a certain path all research trees converge onto one true research tree which is self-executing, it is true that available compute and starting point is entirely determinative beyond that point. These are heavy assumptions and we’re well past my “this is a singularity, and its consequences are fundamentally unpredictable” anyway, though.
“Selling access to a bioweapon-capable AI to anyone with a credit card” will be safe if the AI is aligned so that it wouldn’t make bioweapons even if terrorists ask it to do so.
I actually don’t think this is the case. You can elide what you are doing or distill it from outputs. There is not that much that distinguishes legitimate research endeavors from weapons development.
Finally, weakening safety is precisely what the AI-2027 forecast tries to warn against.
I very much do not think it succeeds at doing this, although I do credit that the intention is probably legitimately this.