Most of my posts are on my website: https://iter.ca
loops
I actually didn’t know that thought experiment was the origin of paperclip maximization being referenced as a goal for AIs. It’s such a common thing that I never thought to find the origin of it.
According to this article, Meta also has a PAC for funding Democrats, Making Our Tomorrow.
What secret goals does Claude think it has?
The same effect happens for several AI-related queries—“perplexity AI”, “best AI”, “AI vacuum”, “AI printer”, “table AI” all have the same effect. The phenomenon seems to affect many AI-related queries, not just AI safety ones.
FYI the “Which of the following describe you” question in the “Be a contributor to the Inkhaven Residency” application says “Can select… none!” but the form requires you to select at least one to submit.
Could you post coordinates next time? I can’t find the entrance on Elizabeth St. you’re referring to
My impression is that the “pro” models use the same weights as the underlying non-pro model (here, gpt-5.4-thinking) but with scaffolding on top that gets multiple reasoning traces and selects the best one. I think OpenAI’s view is that if the underlying model is safe to deploy, anything that’s just scaffolding on top of it must also be safe (because the safety checks for the underlying model should ensure it’s safe to deploy, even with malicious scaffolding).
With o3-pro OpenAI said:
They haven’t explicitly said the same for later pro models but various documentation about those pro models implies it.
Even though you can’t recreate the exact scaffolding OpenAI uses yourself (because the API doesn’t expose reasoning traces), you can get kinda close by querying the underlying non-pro model a bunch of times and asking a model to choose the best response[1]. It would probably be worth comparing gpt-5.4-thinking with that custom scaffold to gpt-5.4-thinking.
You would also want to have the underlying model include a summary of the reasoning in the output so that the model that chooses the best answer can decide which answer had the best reasoning.