Aren’t ELO scores conserved? The sum of the ELO scores for a fixed population will be unchanged?
The video puts stockfish’s ELO at 2708.4, worse than some human grandmasters, which also suggests to me that he didn’t run the ELO algorithm to convergence and stockfish should be stealing more score from other weaker players.
EDIT ChatGPT 5 thinks the ELOs you suggested for random are reasonable for other reasons. I’m still skeptical but want to point that out.
robo
I do not believe random’s Elo is as high as 477. That Elo was calculated from a population of chess engines where about a third of them were worse than random.
I’m not at all convinced this isn’t a base rate thing. Every year about 1 in 200-400 people have psychotic episodes for the first time. In AI-lab weighted demographics (more males in their 20′s) it’s even higher. And even more people get weird beliefs that don’t track with reality, like find religion or Q-Anon or other conspiracies, but generally continue to function normally in society.
Anecdotally (with tiny sample size), all the people I know who became unexpectedly psychotic in the last 10 years did so before chatbots. If they went unexpectedly psychotic a few years later, you can bet they would have had very weird AI chat logs.
Light disagree. Prefix modifiers are cognitively burdensome compared to postfix modifiers. Imagine reading:
”What I’m about to say is a bit of a rant. I’m about 30% confident it’s true. Disclosure, I have a personal stake in the second organization involved. I’m looking for good counter arguments. Based on a conversation with Paul. I have a formal writeup at this blog post. Part of the argument is unfair, I apologize. I...”Gaaa, just give me something concrete already! It’s going to be hard enough understanding your argument as it is; it’s even harder for me to understand your argument while having to keep unresolved modifiers loaded in my mental stack.
Ha, and I have been writing up a long-form for when AI-coded-GOFAI might become effective, one might even say unreasonably effective.
LLMs aren’t very good at learning in environments with very few data samples, such as “learning on the job” or interacting with the slow real world. But there often exist heuristics, ones that are difficult to run on a neural net, with excellent specificity that are capable of proving their predictive power with a small number of examples. You can try to learn the position of the planets by feeding 10,000 examples into a neural network, but you’re much better off with Newton’s laws coded into your ensemble. Data constrained environments (like, again, robots and learning on the job) are domains where the bitter lesson might not have bite.
Back in the GOFAI days, when AI meant A* search, I remember thinking:
Computers are wildly superhuman at explicit (System 2 reasoning) like doing arithmetic or searching through chess moves
Computers are garbage at (System 1 reasoning), like recognizing a picture of a cat
When computers get good at System 1, they will be wildly superhuman at everything
Now transformers appear to be good at System 1 reasoning, but computers aren’t better at humans at everything. Why?
I think it comes down to:Computers’ System 1 is still wildly sub-human at sample efficiency; they’re just billions of times faster than humans
LLM’s work because they can train on an inhuman amount of reading material. When trained on only human amounts of material, they suck.
LLM Agents aren’t very good because they can’t learn on the job. Even dumb humans learn better instincts after a little on-the-job practice. We can just barely improve LLM’s System 1 from its System 2, but only by brute forcing an inhuman number of roll-outs.
Robots suck, because the real world is slow and we don’t have good tricks to train their System 1 by brute force.
We’re in a weird paradigm where computers are billions of times faster than humans, but thousands of times worse at learning from a datum.
I think I disagree. It’s more informative to answer in terms of value as it would be measured today, not value after the economy adjusts.
Suppose someone from 1800 wants to figure out how big a deal mechanized farm equipment will be for humanity. They call up 2025 and ask “How big a portion of your economy is devoted to mechanized farm equipment, or farming enabled by mechanized equipment?” We give them a tiny number. They also ask about top-hats, and we also give them a tiny number. From these tiny numbers they conclude both mechanized farm equipment and top-hats won’t be important for humanity.
EDIT The sort of situation I’m worried about your definition missing is if remote-worker AGI becomes too cheap to meter, but human hands are still valuable.
Would you agree your take is rather contrarian?
* This is not a parliamentary system. The President doesn’t get booted from office when they lose majority support—they have to be impeached[1].
* Successful impeachment takes 67 Senate votes.
* 25 states (half of Senate seats) voted for Trump 3 elections in a row (2016, 2020, 2024).
* So to impeach Trump, you’d need the votes of Senators from at least 9 states where Trump won 3 elections in a row.
* Betting markets expect (70% chance) Republicans to keep their 50 seats majority in the November Election, not a crash in support.- ^
Or removed by the 25th amendment, which is strictly harder if the president protests (requires 2⁄3 vote to remove in both House and Senate).
- ^
...your modal estimate for the timing of Vance ascending to the presidency is more than two years before Trump’s term ends?
And the market’s top pick for President has read AI 2027.
$750 per books seems surprisingly reasonable to me as a royalty rate for a compulsory AI ingest license. Compulsory licenses are common in e.g. the music industry, you must license your musical work for covers (and get a 12¢ royalty per distribution)
I second the video recommendation.
A friend in China, in a rare conversation we had about international politics, was annoyed at US politicians for saying China was “supporting” Russia. “China has the production capacity to make easily 500,000 drones per day.[1]”, he said. “If China were supporting Russia, the war would be over”. And I had to admit I had not credited the Chinese government for keeping its insanely competitive companies from smuggling more drones into Russia.- ^
This seemed like a drastic underestimate to me.
- ^
Huh, I didn’t expect to take Gary Marcus’s side against yours but I do for almost all of these. If we take your two strongest cases:
No massive advance (no GPT-5, or disappointing GPT-5)
There was no GPT-5 in 2024? And there is still no GPT 5? People were talking in late 2023 like GPT 5 might come out in a few months, and they were wrong. The magic of “everything just gets better with scale” really seemed to slow after GPT-4?
On reasoning models: I thought of reasoning models happening internally at Anthropic in 2023 and being distilled into public models, which was why Claude was so good at programming. But I could be wrong or have my timelines messed up.
Modest lasting corporate adoption
I’d say this is true? Read e.g. Dwarkesh talking about how he’s pretty AI forward but even he has a lot of trouble getting AIs to do something useful. Many corporations are trying to get AIs to be useful in California, fewer elsewhere, and I’m not convinced these will last.
I don’t think I really want to argue about these, more I find it weird people can in good faith have such different takes. I remember 2024 as a year I got continuously more bearish on LLM progress[1].
- ^
Until DeepSeek in late December.
Cool, yes, I agree. But the reason other insiders don’t like that public criticism is because it reduces their status by association. Your colleagues paid to get a position in a status hierarchy which you are devaluing, and they make you internalize those costs.
I don’t think I agree at all. Relevant quotation, Larry Summers talking to Elizabeth Warren
> “Larry leaned back in his chair and offered me some advice. I had a choice. I could be an insider or I could be an outsider. Outsiders can say whatever they want. But people on the inside don’t listen to them. Insiders, however, get lots of access and a chance to push their ideas. People — powerful people — listen to what they have to say. But insiders also understand one unbreakable rule: They don’t criticize other insiders.
People come with features vectors of which clusters they are bucketed into (Harvard graduate, east bay rationalist, FTX employee, etc). Your reputation is tied to the reputation of that cluster, whether you want it to or not.Whistleblowers are rare and their effects are minor. In-group cooperation and collusion is a large part of human affairs.
EDIT I agree with you, and just didn’t understand what you said. My rephrase would be the article had it backwards:
Prediction: If consortment was less endorsement—if it were commonplace to spend time with your enemies—then it would be more commonplace to publicly report small wrongs.
This is reversed. It’s the wrong-doers who are avoiding interactions with anyone who might publicly report small wrongs.
Online advertising can be used to promote books. Unlike many books, you are not trying to make a profit and can pay for advertising beyond where the publisher’s marginal costs equals marginal revenue. Do you:
Have online advertising campaigns set up by your publisher and can absorb donations to spend on more advertising (LLM doubts Little, Brown and Company lets authors spend more money)
Have $$$ to spend on an advertising campaign but don’t have the managerial bandwidth to set one up. You’d need logistics support to set up an effective advertising campaign.
Need both money and logistics for an advertising campaign.
Alphabet and Meta employees get several hundred dollars per month to spend on on advertising (as incentive to dogfood their product). If LessWrong employees at those companies setup many $300 / month advertising campaigns, that sounds like a worthwhile investment
Need neither help setting up an advertising campaign nor funds for more advertising (though donations to MIRI are of course always welcome)
I’m very glad you’ve used focus groups! Based solely on the title the results are excellent. I’m idly curious how you assembled the participants.
Do you have a way to get feedback from Chinese nationalists? (“America Hawks” in China?).
Given the potentially massive importance of a Chinese version, it may be worth burning $8,000 to start the translation before proofreading is done, particularly if your translators come back with questions that are better clarified in the English text. I’d pay money to help speed this up if that’s the bottleneck[1]. When I was in China I didn’t have a good way of explaining what I was doing and why.
- ^
I’m working mostly off savings and wouldn’t especially want to, but I would to make it happen.
- ^
It’s a reference to the title of a novel by Fred Hoyle.
Not if the ELO algorithm isn’t run to completion. It takes a long time to make large gaps in ELO, like between stockfish and Random, if you don’t have a lot of intermediate players. It’s hard for ELO to different between +1000 ELO and +2000 ELO—both mean “wins virtually all the time”.