Running https://aiplans.org
Fulltime working on the alignment problem.
For future collaborations, we will strive to improve transparency wherever possible, ensuring contributors have clearer information about funding sources, data access, and usage purposes at the outset.
Would you make a statement that would make you legally liable/accountable on this?
People’s heart being in the right place doesn’t stop them from succumbing to incentives, just changes how long it will take them to do so and what excuses they will make. Solution is better incentives. Seems that Epoch AI isn’t set up with a robust incentive structure atm. Hope this changes.
I utterly fucking despise that bluedot have become a mini open philanthropy and and combining it with a ‘yc but for things that can be called ai safety’ look and also doing the bullshit application forms.
They are not special and there’s plenty of other sources of money. It’s time they learnt this.
Prestige Maxing is Killing the AI Safety Field
Some nice talks and lots of high quality people signed up, but 2 weeks late starting because I massively underestimated how long it would take to give personalized feedback to 300 applicants and also kept trying to use really unweildy software and turned out its faster to do it manually. and also didn’t get the research guides (https://moonshot-alignment-program.notion.site/Updated-Research-Guides-255a2fee3c6780f68a59d07440e06d53?pvs=74) ready in time and didn’t coordinate a lot of things properly.
Also, a lot of fuckups with luma, notion and google forms.
Overall worked in marketing the event okish, 298 signups, but extremely badly in running it due to disorganization on my part. I’m not put off by this though, because the first alignment evals hackathon was like this, then the second one, we learnt from that and it went really well.
Learning a lot from this one too and among other things, making our own events thing, because i recently saw the founder of luma saying on twitter that they’re ‘just vibecoding!’ and dont have a backend engineer and really frequently have a lot of pains when using luma https://test.ai-plans.com/events
Also, gonna be taking more time to prepare for the next event and only guaranteeing a max of 100 people feedback—free to the first 50 to apply and optional for up to 50 others who can pay $10 to get personalized feedback.
And gonna make very clear template schedules for the mentors, so that we (I) don’t waste their time, have things be vauge, them not actually getting people joining their research, etc.
It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.
What do you think of implementing AI Liability as proposed by, e.g. Beckers & Teubner?
Hi, making a guide/course for evals, very much in the early draft stage atm
Please consider giving feedback
https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?usp=sharing
Have you looked at marketing/messaging software? The things of knowing which template messages work best in which cases sound quite similar to this and might have overlap. I would be surprised if e.g. MrBeast’s team didn’t have something tracking which video titles and thumbnails do best with which audiences, which script structures do best, an easy way to make variants, etc.
so for this and other reasons, its hard to say when an eval has been truly successfully ‘red teamed’
One of the major problems with this atm is that most ‘alignment’, ‘safety’, etc evals dont specify or define exactly what they’re trying to measure.
Hi, hosting an Alignment Evals hackathon for red teaming evals and making more robust ones, on November 1st: https://luma.com/h3hk7pvc
Team from previous one presented at ICML
Team in January made one of the first Interp based Evals for LLMs
All works from this will go towards the AI Plans Alignment Plan—if you want to do extremely impactful alignment research I think this is one of the best events in the world.
Huh. I think I’ve had at least 3. And done so myself for at least 2 people who refuted me
Not really—the guy DMed me and we had a call. He wanted to learn more. Also, talked about meeting up in Washington, when I host an event there.
wonder if there would be any value in going into such less-enlightened spaces and fighting the good fight, debating people like a 2000s-era atheist. It seems to have mostly worked out for the atheists.
Not in a debatey way, but in an informative way, yes. Pretty easy too
We just dont talk at all atm. Not likely to change in the future tbh. He doesnt respond to my calls or texts.
this is bleak news, thank you for sharing.
this is one of the most specifc and funny things I’ve read in a while: https://tomasbjartur.substack.com/p/the-company-man
He’s def wrong.
I’m still going to care because he’s my father and I love him and care about him. And also becase I don’t like ideas that are about discouraging caring.
I’m going to feel the pain that comes with caring, but that’s fine. And yes his opinion is wrong and is not going to stop me anymore.
There are plenty of others who care about and respect my work and unlike my dad’s disrespect, that can actually go into making me money and increasing what I can do.
Thank you for the support btw, I really appreciate it <3
if serious about us china cooperation and not cargo culting, please read: https://www.cac.gov.cn/2025-09/15/c_1759653448369123.htm