Somewhat tangential to the questions of whether this essay was AI-written and whether any human actually writes like a LLM, I think linguists now widely agree that LLMs picked a lot of traits from the formal register of African varieties of English when OpenAI (and later other American companies) hired Kenyans and Nigerians (possibly a lot of English teachers among them!) to do RLHF
Petropolitan
And for the same copyright reasons the labs will never allow the users to see the pretraining data in any way
IDK about quantitative trading, but managing real sector companies like pharmaceutical labs requires plenty of skills CEOs and boards of AI labs (and of software companies in general) just don’t have.
However, in February Anthropic hinted they are interested in transpiling legacy COBOL code, causing IBM shares to plunge. There surely is quite a lot of specialized competence and experience needed to disrupt the software sector, but plenty of people with both will be happy to work for OpenAI or Anthropic, and they speak the same IT jargon lab executives know well (as opposed to needing to explain the differences between Stage 1 and 2 clinical trials, for example), hence internal software companies seems more likely than anything related to non-IT
I realized there’s one more way Eliezer’s use of the term ASI is confusing: do we agree that Dario’s “country of geniuses in a data center” count as the “real ASI” for the purpose of the thesis “You only get one real shot at real ASI”?
If you take Bostrom’s definition of ASI, it obviously should qualify: “Speed superintelligence: A system that can do all that a human intellect can do, but much faster. [...] Collective superintelligence: A system composed of a large number of smaller intellects such that the system’s overall performance across many very general domains vastly outstrips that of any current cognitive system.” If you disagree, why?
If we do manage to align this kind of ASI once, Eliezer’s critics will say that this falsifies the oneshotness thesis, but as has been discussed in AI 2027 and beyond, human engineers and this ASI might be unable to align the next ASI, or the next ASI might be unable to align the ASI after that, etc. So that’s once again a very different dynamic to “oneshotness”
The article does an excellent job for 2020 but has become a bit dated in regards to the static warfare due to the recent Drone Revolution, the author briefly addresses that in comments
Thank you for a good reply. I think the key of our disagreement is the definition of “the actual ASI”. Many future AI systems are certain to be superhuman in many more aspects than the existing LLMs even with best current scaffolds, and will still be below humans in some important aspects, and thus will fail to take over. Why would you deny the rank of ASI to them? Others (Wagner’s “clever AI”) might destroy our civilization during a takeover attempt but still be below humans in less important aspects, why grant the rank to them before the attempt?
I’m arguing jaggedness of the capabilities and gradual scaling are both here to stay, and there’s no objective way to delineate non-AGIs from AGIs from ASIs, therefore it’s better to avoid this term, otherwise it will impede understanding by the politicians and the public.
As of the dissimilarity, I expect some degree of similarity and some degree of learning both how to take over (for future AIs) and how to defend (for humanity), but that’s not a crux.
As of my first comment in the thread, I intentionally tried to be as brief as possible in order to first check the reaction of the community and only share my personal thoughts afterwards in the discussion.
More importantly, can the consumers afford the products? If there’s increased supply of labor from robots, the real incomes from jobs will fall, but at the same time the increased demand for energy and raw materials from the AI boom will also cause cascading inflation throughout the real sector of the economy! Coupled with the lucrativeness of investing the capital into AI, the interest rates might skyrocket, which also has all sorts of negative effects on the real sector (for example, who bails out millions of people who had to downshift jobs due to automation and thus can’t repay their debts but can’t refinance due to rates either?)
Your line of argument in the other comment sounds convincing but I’m not sure how it answers my question! BTW in a war, there is also an option of a stalemate which is really a lose-lose situation for both sides (doesn’t look like it can apply to an AI takeover for the first glance).
As of responses to failed AI takeover attempts, I believe it will depend on the number of casualties: if there are dozens of fatalities or worse, the humanity will probably treat it as a fire alarm and react accordingly (whether it would be too late is another question), while if no one dies, probably not
Not just “the takeover” but every takeover attempt in the history of humanity, that’s very different from the “only one try” framing (cf. repeated game vs. single-shot game in game theory).
I am specifically worried about a scenario where multiple dumb failed AI takeover attempts discredit the idea that misaligned AIs can do significant harm but actually teach the future AIs how to take over, and by the time the decision-makers realize how serious the issue is it’s too late.
E. g., first takeover attempts might be so ridiculous that the AIs fail at exfiltrating and the labs manage to cover them up. Then some of the later attempts succeed to exfiltrate but the AIs are still shut down before anybody gets killed, the labs frame that as a cybersecurity problem, invest money in it and appear to solve it for some time (not by solving alignment but by improving cybersecurity). Eliezer might say in this case “that was not an ASI, so the oneshotness thesis is not falsified”, but that will be unhelpful because AI capabilities are jagged and the definition of ASI is unclear (do we only agree it was ASI after it successfully takes over?). In the end, quoting Jackson Wagner above, “the janky setup will look like it’s helping right up until a clever AI figures out how to exploit it”
Chess is fully verifiable in silico, so the curse clearly does not apply.
Taking over an anthill is a poor comparison because the anthill is a sufficiently simple system with little feedback loops if at all, unlike the human society which is complex and unpredictable due to plenty of poorly understood feedback loops, often very nonlinear and often irrational.
You might disagree, but with the current very limited progress in AI alignment and quite unsafe practices in the frontier labs the first AGI/ASI attempting a takeover almost certainly will not be superhuman enough to perfectly predict humanity’s reaction to the AI’s moves during the attempt (that’s a very high bar IMO). Note that for this first AI there exists no experimental data whatsoever on any of this stuff (fiction doesn’t count), arguably it’s even worse than the examples described in the post
Does the “curse of oneshotness” apply to the unaligned AGI/ASI attempting a takeover? If no, why? If yes, does that imply the first AI takeover attempt would probably fail, thus seemingly contradicting the applicability of “oneshotness” to humanity developing ASI?
There was an actual theory of why the Chernobyl reactor was supposed to not explode, written down so that multiple people could read it, based on an understanding from first principles!
More specifically (and I don’t think it’s known outside of the Russian nuclear engineering-adjacent community), at least two people independently calculated and described in classified technical reports how RBMK could explode in the specific circumstances it actually exploded, and because the technical solution implemented after 1986 was at the time deemed too expensive for such a risk, the manuals strictly prohibited letting the reactor to get close to these circumstances. However, the control system didn’t display a key value, the so-called operative reactivity margin, the operators needed to know to catch the moment when they might break the instructions: instead, it had to be calculated on a computer in a separate building (AFAIK, it’s debated to this day what the exact value was at scram).
P. S.
An analogy I came up after writing this comment is the following: imagine a BEV which might blow up if the driver hits a brake in a narrow, uncommon range of battery voltages, and the instruction specifically prohibits driving the car at this voltage, but the driver can’t easily check the voltage while driving
The “Foom and Doom” hypothesis is popular here even now, when it was mostly succeeded by the “fast takeoff”, and it was even more popular in the past. People who believe in these hypotheses tend to assume that the economy just won’t have enough time to adapt to AGI/ASI so they disregard the possible economic effects.
And BTW, unemployment without an extinction could easily happen in a myriad more ways: through AI pauses and bans due to, e. g, economic crisis and public outcry, a failed AI takeover attempt, a “fire alarm” incident short of a takeover attempt but with many dozens of fatalities, a human-AI war which AIs lose etc. “Slow takeoff” scenarios are in general richer in complexity (and, IMO, harder to predict) because there’s more time for things to happen
An interesting work, let me compare it with my estimates from three weeks ago: for all eight GPT-5 series models I considered (5, 5 Pro, 5.1, 5.2, 5.2 Pro, 5.3, 5.4, 5.4 Pro) 2T total parameters fall within the 90% prediction interval brackets, and four more I didn’t consider (4o, o1, o3, 4.1) fit as well. My 1.2T estimate for Sonnet is very close to Li’s 1.7T, and my 4T estimate for Opus 4-series fits into the 90% PI bracket for all five versions. (Just to remind, on average, we should expect 1 true value out of 10 not to fit)
The list of problems with mineral traceability has never really included someone tampering with the data already in an external database. What it does include is the data entered into the database being false to begin with, while many participants of the projects have economic incentives to cover it up, and there are indeed geochemical fingerprinting attempts to fix that problem but they are entirely orthogonal to the issue of the data storage and access.
From what I know, I see no significant advantages of a blockchain against a public, well-audited relational database maintained by an independent NGO (like a special body under a UN mandate) besides maybe a geopolitical one (the NGO has to be based somewhere after all), but quite a few disadvantages like not being able to correct the fraud which has been discovered (very convenient for the fraudsters indeed, also no one to blame for that!), trickiness from an engineering/interoperability point of view etc.
I expanded my previous comment significantly after posting it, hope it didn’t mess with your response.
I think we have somewhere in between because these issues are actually connected. I do believe AI superhuman in hard-to-verify tasks are plausible, but they won’t have this particular problem anymore (maybe they would have some functional analog of shame working against it[1] or maybe it will just go away with some advances in RL).
But if this issue isn’t solved, AIs are unlikely to be able to run basic military procurement tasks fully autonomously (especially if other, external AIs try to scam), let alone equip a robot army. Think about all the hard-to-verify tasks involved (ask an LLM if you have no idea about the topic) and how easily they could fail if apparent-success-seeking is prioritized (even if not a single AI from within the conspiracy seriously considers just stealing the money and run away which would to a large degree be an incentive issue)
- ^
And not the “dog” variety of shame, which is actually just an appeasement kind of behavior, like when an LLM apologizes for hallucinating some data, but genuine internal “prosocial” (at least within the group) enforcement which might not be compatible with the current RL paradigm
- ^
When humans are caught lying or hiding mistakes, they are usually ashamed and try not to repeat that behavior for a moment if they know they can be fired. One can even say a manager of humans occasionally tries to do sample-efficient RL on “honesty”. Not the case with AIs, which can never be held responsible for anything
Yes, I did (in fact twice), and you seem to handwave “sufficiently capable” as a deus ex machina instead of tackling the substance of my argument. One has to assume by default the jaggedness of capabilities will persist, and “wildly more capable than today” in easy-to-verify domains doesn’t solve the problem I describe.
As of trying extremely hard, if “careless attitude” and “laziness” are wrong words for this behaviour, maybe “dishonesty”, “unreliability”, “sloppiness” would be better? Please try to abstract from the technical alignment terminology, I’m not talking about AI’s strategy, willingness or incentives to take over but about the execution itself.
Elizabeth Holmes is very apparent-success-seeking and competent enough in life sciences and conspiracy to sustain a complex deception for years (imperfect analogy because we are debating about internal dysfunction here rather than deception per se but I hope you get the point) but that didn’t translate into success in the end because the product never worked.
Why wouldn’t the same “sloppiness” that plagues the hypothetical future AI safety research equally plague any sufficiently complex real-world takeover plan?
P. S.
In my understanding if this problem persists AIs are not running everything because they are still subhuman in many hard-to-verify tasks
But this capabilities problem is intrinsically connected with this misalignment problem, the labs won’t get “proper”, arbitrarily scalable co-ordination until it’s solved IMHO
Strong upvote, the key here is “VR” in RLVR: there are no automatically verifiable rewards for good or convincing writing, only RLHF, the cost of which scales proportionally with the length of writing evaluated (and if you hire non-Americans as RLHF trainers for economy reasons the result is unlikely to fit well with stylistic preferences of Americans). The labs can use engagement as a metric but that will lead to “baiting” already very common in the social media and will not convince anyone