Continue to scale, at or slightly behind the frontier. Complain loudly about the race dynamics you’re in, and advocate to legislation to show down. Make scary capabilities demos to support your case.
anaguma
In contrast, continual learning would in the near term likely only perform at the level of in-context learning, which is incapable of things like learning to play chess at the level of the most capable humans.
I think this particular task just takes a lot of compute to learn, relative to other RLVR tasks. This is especially true if you are training with RLVR and not MCTS self-play methods, which are much more sample efficient.
Is it publicly known what is in fact the difference between the two contracts?
(Reader advisory note: I quote some people at length because no one ever clicks links, but you are free to skip over long quote boxes. I’m trying to raise chance of reading the full quote to ~25% from ~1%, not get it to ~90%.)
I was listening to the audio version of this, and I found this very helpful, especially linking to Daniel’s essay. In fact I went back and listened to it again, so for me it was ~200%!
It seems like it might be a good time to have an international treaty banning lethal autonomous weapons.
Ezra Klein has posted an interview with Jack Clark, co-founder of Anthropic, discussing capabilities progress and safety.
Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia.
Not all good futures need to result in utopia.
Before I could get this report out, OpenAI also gave us GPT-5.3-Codex-Spark, which is ultra-low latency Codex, more than 1,000 tokens per second. Wowsers. That’s fast.
As in, really super duper fast. Code appears essentially instantaneously. There are times when you feel the need for speed and not the need for robust intelligence. Many tasks are more about getting it done than about being the best like no one ever was.
It does seem like it is a distinct model, akin to GPT-5.3-Codex-Flash, with only a 128k context window and lower benchmark scores, so you’ll need to be confident that is what you want.
I think the majority of this speedup due to OpenAI’s partnership with Cerebras, rather than a new model. Cerebras chips can reach this speed because they are much larger and don’t have to pay the costs of interconnect.
But (1) bandwidth might not be better in this case; it isn’t in all cases
The entropy of LLM generated text is a few bits per token, whereas the hidden state contains 10-100k bits. It’s hard to imagine any method which passes around hidden states[1] to have lower bandwidth than CoT tokens!
- ^
Or similarly sized tensors
- ^
Claude 4.6 was released about an hour ago. Just 10 mins after it was released, OpenAI released GPT-5.3.
They’ll be able to automate ML research in the sense of coming up with experiments to try, and implementing those experiments, but never any new conceptual work.
I agree this seems unlikely. As a baseline, labs have thousands of capabilities researchers coming up with insights, and they could train the models to imitate them. There is also a path of RLVRing against the results of small scale experiments. It’s more expensive to collect data for research taste, but it doesn’t seem like a difference in kind to software engineering.
And people still wonder how the AIs could possibly take over!
How has your p(doom) changed over this period?
What do you think have been the most important applications of UDT or other decision theories to alignment?
2x uplift is already happening at the most advanced AI lab
This seems plausible to me, but would be good to have a new METR uplift study to have more confidence in this.
Could you give an example of an article where this was effective?
The existential risks that everyone will die or that the future will belong to the AIs are obvious.
I not sure that this is obvious to most, particularly outside of LW.
The one bench we definitely don’t want to be bench-maxxed.
arguing that if you don’t know the hazard rate, but instead have uncertainty about it that you update over time, hyperbolic-looking discounting can fall out from there
This seems relevant to X-risk discussions.
The historical evidence on this point is mixed. For example
The US sent thousands of troops to kill the Mexican revolutionary leader Pancho Villa from 1916-17.
In WWII, the allies considered assassinating Hitler in Operation Foxley, but decided to keep him alive due to his incompetence[1]. They also killed Reinhard Heydrich, Reichsprotektor of Bohemia and Chief of the Reich Security Main Office, and Isoroku Yamamoto, Commander-in-Chief of the Japanese Combined Fleet. However, the allies decided not to kill Emperor Hirohito, so that he could order the Japanese surrender.
More recently, the CIA made multiple assassination attempts on Fidel Castro, leader of Cuba.
Twice the US and Nato attempted to assassinate Libyan Leader Muammar Gaddafi, the latter successfully.
So it’s uncommon for the reasons you mention, but not unprecedented.
For example, Major Field-Robertson argued, “As a strategist, Hitler has been of the greatest possible assistance to the British war effort. I have no hesitation in saying that his value to us has been the equivalent to an almost unlimited number of first-class SOE agents strategically placed inside Germany.”