This is fantastic. It has me wondering what other cheap, highly effective things we can set modern AI to for AI safety.
Thoughts I had about this specifically:
Re: Sonnet for search and 4o for analysis, could Opus or GPT 5.2 have been cheaper or better? My impression is that Opus 4.5, despite higher token costs, uses them more efficiently.
Would it be cheaper to use Gemini/Perplexity (which, as I understand it, tend to be more efficient and powerful when searching than Claude)?
Would using Grok to pull Twitter data have been helpful?
Would a verification pass using different instances improve hallucinations, find missed data points, and improve the code?
I’m not very experienced in doing this—these are just the first thoughts that came to mind. I’m half tempted to try to replicate your results!
Opus, GPT 5.2, Gemini, Perplexity, Grok (for twitter data) or something else could be more accurate and cheeper. I spent very little time trying to figure out the ideal setup for the research phase. If anyone has thoughts on this I would be interested.
Re “what other cheap, highly effective things we can set modern AI to for AI safety”:
The best thing I can think of is to research every politician that is running for any US office and gauge their position on AI from deep research. Then flag the best campaign to work on in every state.
Likewise, scrape Linkedin, twitter and social media for people working at frontier labs. What percent of people at each lab have explicitly condemned alignment efforts? What percent at each lab endorse them?
If anyone else has ideas, let me know!
I agree that having a verification pass would be good.
Re: “I’m half tempted to try to replicate your results!”
You should do this! One issue with the approach in this post is the scoring functions are pretty noisy. Even rerunning the evaluation phase, not the research phase, with a more detailed and specific evaluation strategy may give much more useful results than this post.
In general, writing a meta post on the cheapest and most accurate way to do these kinds of deep research dives seems very good! I don’t know how wide the audience is for this, but for what it is worth, I would read this.
(Note that this post wasn’t front-paged so if you want to reach a wide audience on LessWrong in follow up work, I would reach out to mods to get a sense of what is acceptable and lean away from doing more political posts)
This is fantastic. It has me wondering what other cheap, highly effective things we can set modern AI to for AI safety.
Thoughts I had about this specifically:
Re: Sonnet for search and 4o for analysis, could Opus or GPT 5.2 have been cheaper or better? My impression is that Opus 4.5, despite higher token costs, uses them more efficiently.
Would it be cheaper to use Gemini/Perplexity (which, as I understand it, tend to be more efficient and powerful when searching than Claude)?
Would using Grok to pull Twitter data have been helpful?
Would a verification pass using different instances improve hallucinations, find missed data points, and improve the code?
I’m not very experienced in doing this—these are just the first thoughts that came to mind. I’m half tempted to try to replicate your results!
I’m excited you found this interesting! Thoughts:
Opus, GPT 5.2, Gemini, Perplexity, Grok (for twitter data) or something else could be more accurate and cheeper. I spent very little time trying to figure out the ideal setup for the research phase. If anyone has thoughts on this I would be interested.
Re “what other cheap, highly effective things we can set modern AI to for AI safety”:
The best thing I can think of is to research every politician that is running for any US office and gauge their position on AI from deep research. Then flag the best campaign to work on in every state.
Likewise, scrape Linkedin, twitter and social media for people working at frontier labs. What percent of people at each lab have explicitly condemned alignment efforts? What percent at each lab endorse them?
If anyone else has ideas, let me know!
I agree that having a verification pass would be good.
Re: “I’m half tempted to try to replicate your results!”
You should do this! One issue with the approach in this post is the scoring functions are pretty noisy. Even rerunning the evaluation phase, not the research phase, with a more detailed and specific evaluation strategy may give much more useful results than this post.
In general, writing a meta post on the cheapest and most accurate way to do these kinds of deep research dives seems very good! I don’t know how wide the audience is for this, but for what it is worth, I would read this.
(Note that this post wasn’t front-paged so if you want to reach a wide audience on LessWrong in follow up work, I would reach out to mods to get a sense of what is acceptable and lean away from doing more political posts)