I work primarily on AI Alignment. Scroll down to my pinned Shortform for an idea of my current work and who I’d like to collaborate with.
Website: https://jacquesthibodeau.com
Twitter: https://twitter.com/JacquesThibs
GitHub: https://github.com/JayThibs
LinkedIn: https://www.linkedin.com/in/jacques-thibodeau/
Nice. We pointed to safety benchmarks as a “number goes up” thing for automating alignment tasks here: https://www.lesswrong.com/posts/FqpAPC48CzAtvfx5C/concrete-projects-for-improving-current-technical-safety#Focused_Benchmarks_of_Safety_Research_Automation
I think people are perhaps avoiding this for ‘capabilities’ reasons, but they shouldn’t because we can get a lot of safety value out of models if we take the lead on this.