One thing I’ve found helpful is to try to set up my life in a way that I won’t regret if the world ends in five years. For example, spending lots of time with friends and family, working towards meaningful goals, etc. Though I think this is usually good advice in the (imo more likely) case that world doesn’t end in five years.
anaguma
For example, solving AI intent alignment allows humanity to do more damage to itself with AI help, if AI doesn’t also provide competent strategic and philosophical assistance, but increasing AI strategic competence risks allowing misaligned AI to take over more easily.
Do you think training for just philosophical assistance, without long term agency, would still increase takeover risks?
Wow, I wonder what the model was trying to do, and whether other labs have observed similar incidents.
Ezra Klein has published a new podcast, “Why the Pentagon Wants to Destroy Anthropic”, with Dean Ball, which I recommend!
He works at OpenAI and has been critical of their contract, so I assume it’s related to that.
In pretraining, each forward pass corresponds to one evaluatable and distinct ‘reward’-event.
In pretraining, you get one loss signal for each token in the forward pass; a single batch typically contains 10-100M tokens. For RL, you get a few bits of reward for each trajectory, which consists of many forward passes. So the efficiency difference is even larger than you outline here.
It seems to me like there was a sort of gentleman’s agreement not to focus on killing the leader of the other country that you are at war with. You wanted someone with the ability and authority to surrender.
The historical evidence on this point is mixed. For example
The US sent thousands of troops to kill the Mexican revolutionary leader Pancho Villa from 1916-17.
In WWII, the allies considered assassinating Hitler in Operation Foxley, but decided to keep him alive due to his incompetence[1]. They also killed Reinhard Heydrich, Reichsprotektor of Bohemia and Chief of the Reich Security Main Office, and Isoroku Yamamoto, Commander-in-Chief of the Japanese Combined Fleet. However, the allies decided not to kill Emperor Hirohito, so that he could order the Japanese surrender.
More recently, the CIA made multiple assassination attempts on Fidel Castro, leader of Cuba.
Twice the US and Nato attempted to assassinate Libyan Leader Muammar Gaddafi, the latter successfully.
So it’s uncommon for the reasons you mention, but not unprecedented.
- ^
For example, Major Field-Robertson argued, “As a strategist, Hitler has been of the greatest possible assistance to the British war effort. I have no hesitation in saying that his value to us has been the equivalent to an almost unlimited number of first-class SOE agents strategically placed inside Germany.”
Continue to scale, at or slightly behind the frontier. Complain loudly about the race dynamics you’re in, and advocate to legislation to show down. Make scary capabilities demos to support your case.
In contrast, continual learning would in the near term likely only perform at the level of in-context learning, which is incapable of things like learning to play chess at the level of the most capable humans.
I think this particular task just takes a lot of compute to learn, relative to other RLVR tasks. This is especially true if you are training with RLVR and not MCTS self-play methods, which are much more sample efficient.
Is it publicly known what is in fact the difference between the two contracts?
(Reader advisory note: I quote some people at length because no one ever clicks links, but you are free to skip over long quote boxes. I’m trying to raise chance of reading the full quote to ~25% from ~1%, not get it to ~90%.)
I was listening to the audio version of this, and I found this very helpful, especially linking to Daniel’s essay. In fact I went back and listened to it again, so for me it was ~200%!
It seems like it might be a good time to have an international treaty banning lethal autonomous weapons.
Ezra Klein has posted an interview with Jack Clark, co-founder of Anthropic, discussing capabilities progress and safety.
Therefore, even if progress towards ASI is shut down, there doesn’t seem to be a very good off-ramp to turn this advantage into utopia.
Not all good futures need to result in utopia.
Before I could get this report out, OpenAI also gave us GPT-5.3-Codex-Spark, which is ultra-low latency Codex, more than 1,000 tokens per second. Wowsers. That’s fast.
As in, really super duper fast. Code appears essentially instantaneously. There are times when you feel the need for speed and not the need for robust intelligence. Many tasks are more about getting it done than about being the best like no one ever was.
It does seem like it is a distinct model, akin to GPT-5.3-Codex-Flash, with only a 128k context window and lower benchmark scores, so you’ll need to be confident that is what you want.
I think the majority of this speedup due to OpenAI’s partnership with Cerebras, rather than a new model. Cerebras chips can reach this speed because they are much larger and don’t have to pay the costs of interconnect.
But (1) bandwidth might not be better in this case; it isn’t in all cases
The entropy of LLM generated text is a few bits per token, whereas the hidden state contains 10-100k bits. It’s hard to imagine any method which passes around hidden states[1] to have lower bandwidth than CoT tokens!
- ^
Or similarly sized tensors
- ^
Claude 4.6 was released about an hour ago. Just 10 mins after it was released, OpenAI released GPT-5.3.
They’ll be able to automate ML research in the sense of coming up with experiments to try, and implementing those experiments, but never any new conceptual work.
I agree this seems unlikely. As a baseline, labs have thousands of capabilities researchers coming up with insights, and they could train the models to imitate them. There is also a path of RLVRing against the results of small scale experiments. It’s more expensive to collect data for research taste, but it doesn’t seem like a difference in kind to software engineering.
And people still wonder how the AIs could possibly take over!
Do you agree with this take?