Hi, I’m running AI Plans, an alignment research lab. We’ve run research events attended by people from OpenAI, DeepMind, MIRI, AMD, Meta, Google, JPMorganChase and more. And had several alignment breakthroughs, including a team finding out that LLMs are maximizers, one of the first Interpretability based evals for LLMs, finding how to cheat every AI Safety eval that relies on APIs and several more. We currently have 2 in house research teams, one who’s finding out which post training methods actually work to get the values we want into the models and another who’s reward hacking hundreds of AI Safety Evals. However, we’re in the very weird position of doing a lot of very useful stuff and getting a lot of very elite people applying to and taking part in events, but communicating that absolutely terribly. If you go to our website atm, none of this stuff is obvious at all. This is partly because we’re trying to make a peer review platform for alignment research in addition to running alignment research events and also because I personally really suck at design. I’m looking for someone who doesn’t—someone who really understands how to communicate this stuff and make it look good and can dedicate several hours a week, consistently, for at least 3 months. If this is you, please either me or reach out at kabir@ai-plans.com Please feel free to recommend to others as well. Thank you, Kabir
Hi, I’m running AI Plans, an alignment research lab. We’ve run research events attended by people from OpenAI, DeepMind, MIRI, AMD, Meta, Google, JPMorganChase and more. And had several alignment breakthroughs, including a team finding out that LLMs are maximizers, one of the first Interpretability based evals for LLMs, finding how to cheat every AI Safety eval that relies on APIs and several more.
We currently have 2 in house research teams, one who’s finding out which post training methods actually work to get the values we want into the models and another who’s reward hacking hundreds of AI Safety Evals.
However, we’re in the very weird position of doing a lot of very useful stuff and getting a lot of very elite people applying to and taking part in events, but communicating that absolutely terribly. If you go to our website atm, none of this stuff is obvious at all.
This is partly because we’re trying to make a peer review platform for alignment research in addition to running alignment research events and also because I personally really suck at design.
I’m looking for someone who doesn’t—someone who really understands how to communicate this stuff and make it look good and can dedicate several hours a week, consistently, for at least 3 months. If this is you, please either me or reach out at kabir@ai-plans.com
Please feel free to recommend to others as well.
Thank you,
Kabir