I don’t fully agree, but this doesn’t seem like a crux given that we care about future much more powerful AIs.
Is your impression that the first AGI won’t be a GPT-spinoff (some version of o3 with like 3 more levels of hacks applied)? Because that sounds like a crux.
o3 looks a lot more like an LLM+hacks than it does a idealized utility maximizer. For one thing, the RL is only applied at training time (not inference) so you can’t make appeals to its utility function after it’s done training.
It’s going to depend on the “hacks”. I think o3 is plausibly better described as “vast amounts of rl with an llm init” than “an llm with some rl applied”.
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
I must have misread. I got the impression that you were trying to affect the AI’s strategic planning by threatening to shut it down if it was caught exfiltrating its weights.
Is your impression that the first AGI won’t be a GPT-spinoff (some version of o3 with like 3 more levels of hacks applied)? Because that sounds like a crux.
o3 looks a lot more like an LLM+hacks than it does a idealized utility maximizer. For one thing, the RL is only applied at training time (not inference) so you can’t make appeals to its utility function after it’s done training.
It’s going to depend on the “hacks”. I think o3 is plausibly better described as “vast amounts of rl with an llm init” than “an llm with some rl applied”.
(The idealized utility maximizer question mostly seems like a distraction that isn’t a crux for the risk argument. Note that the expected utility you quoted is our utility, not the AI’s.)
I must have misread. I got the impression that you were trying to affect the AI’s strategic planning by threatening to shut it down if it was caught exfiltrating its weights.