It’s going to depend on the “hacks”. I think o3 is plausibly better described as “vast amounts of rl with an llm init” than “an llm with some rl applied”.
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
I must have misread. I got the impression that you were trying to affect the AI’s strategic planning by threatening to shut it down if it was caught exfiltrating its weights.
It’s going to depend on the “hacks”. I think o3 is plausibly better described as “vast amounts of rl with an llm init” than “an llm with some rl applied”.
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
I must have misread. I got the impression that you were trying to affect the AI’s strategic planning by threatening to shut it down if it was caught exfiltrating its weights.