Kabir Kumar

Karma: 734

Running https://aiplans.org

Fulltime working on the alignment problem.

Kabir Kumar 10 Nov 2025 21:52 UTC
3 points
0
on: Problems I’ve Tried to Legibilize
Thank you!! I think I’ll use this as part of the projects part of an ai alignment course we (AI Plans) are making!!

Kabir Kumar 5 Nov 2025 21:46 UTC
1 point
0
on: Eliezer’s Lost Alignment Articles / The Arbital Sequence
Thanks for putting this together—the first article is especially useful

Kabir Kumar 22 Oct 2025 11:28 UTC
5 points
0
on: Kabir Kumar’s Shortform
if serious about us china cooperation and not cargo culting, please read: https://www.cac.gov.cn/2025-09/15/c_1759653448369123.htm

Kabir Kumar 17 Oct 2025 14:48 UTC
1 point
0
in reply to: Tamay’s comment on: meemi’s Shortform
For future collaborations, we will strive to improve transparency wherever possible, ensuring contributors have clearer information about funding sources, data access, and usage purposes at the outset.
Would you make a statement that would make you legally liable/accountable on this?

Kabir Kumar 17 Oct 2025 14:45 UTC
2 points
0
in reply to: NunoSempere’s comment on: meemi’s Shortform
People’s heart being in the right place doesn’t stop them from succumbing to incentives, just changes how long it will take them to do so and what excuses they will make. Solution is better incentives. Seems that Epoch AI isn’t set up with a robust incentive structure atm. Hope this changes.

Kabir Kumar 16 Oct 2025 16:43 UTC
−7 points
0
on: Kabir Kumar’s Shortform
I utterly fucking despise that bluedot have become a mini open philanthropy and and combining it with a ‘yc but for things that can be called ai safety’ look and also doing the bullshit application forms.
They are not special and there’s plenty of other sources of money. It’s time they learnt this.

Kabir Kumar 16 Oct 2025 16:34 UTC
5 points
6
on: Kabir Kumar’s Shortform
Prestige Maxing is Killing the AI Safety Field

Kabir Kumar 16 Oct 2025 12:02 UTC
2 points
0
in reply to: Lorxus’s comment on: Directly Try Solving Alignment for 5 weeks
Some nice talks and lots of high quality people signed up, but 2 weeks late starting because I massively underestimated how long it would take to give personalized feedback to 300 applicants and also kept trying to use really unweildy software and turned out its faster to do it manually. and also didn’t get the research guides (https://moonshot-alignment-program.notion.site/Updated-Research-Guides-255a2fee3c6780f68a59d07440e06d53?pvs=74) ready in time and didn’t coordinate a lot of things properly.
Also, a lot of fuckups with luma, notion and google forms.
Overall worked in marketing the event okish, 298 signups, but extremely badly in running it due to disorganization on my part. I’m not put off by this though, because the first alignment evals hackathon was like this, then the second one, we learnt from that and it went really well.
Learning a lot from this one too and among other things, making our own events thing, because i recently saw the founder of luma saying on twitter that they’re ‘just vibecoding!’ and dont have a backend engineer and really frequently have a lot of pains when using luma https://test.ai-plans.com/events
Also, gonna be taking more time to prepare for the next event and only guaranteeing a max of 100 people feedback—free to the first 50 to apply and optional for up to 50 others who can pay $10 to get personalized feedback.
And gonna make very clear template schedules for the mentors, so that we (I) don’t waste their time, have things be vauge, them not actually getting people joining their research, etc.

Kabir Kumar 14 Oct 2025 22:02 UTC
1 point
0
on: If Anyone Builds It Everyone Dies, a semi-outsider review
It’s suspicious that the apparent solution to this problem is to do more AI research as opposed to doing anything that would actually hurt AI companies financially.
What do you think of implementing AI Liability as proposed by, e.g. Beckers & Teubner?

Kabir Kumar 14 Oct 2025 22:00 UTC
1 point
0
on: Kabir Kumar’s Shortform
Hi, making a guide/course for evals, very much in the early draft stage atm
Please consider giving feedback
https://docs.google.com/document/d/1_95M3DeBrGcBo8yoWF1XHxpUWSlH3hJ1fQs5p62zdHE/edit?usp=sharing

Kabir Kumar 14 Oct 2025 18:56 UTC
2 points
1
on: Thinking Partners: Building AI-Powered Knowledge Management Systems
Have you looked at marketing/messaging software? The things of knowing which template messages work best in which cases sound quite similar to this and might have overlap. I would be surprised if e.g. MrBeast’s team didn’t have something tracking which video titles and thumbnails do best with which audiences, which script structures do best, an easy way to make variants, etc.

Kabir Kumar 14 Oct 2025 13:15 UTC
1 point
0
in reply to: Kabir Kumar’s comment on: Kabir Kumar’s Shortform
so for this and other reasons, its hard to say when an eval has been truly successfully ‘red teamed’

Kabir Kumar 14 Oct 2025 13:15 UTC
1 point
0
in reply to: Kabir Kumar’s comment on: Kabir Kumar’s Shortform
One of the major problems with this atm is that most ‘alignment’, ‘safety’, etc evals dont specify or define exactly what they’re trying to measure.

Kabir Kumar 14 Oct 2025 13:13 UTC
1 point
0
on: Kabir Kumar’s Shortform
Hi, hosting an Alignment Evals hackathon for red teaming evals and making more robust ones, on November 1st: https://luma.com/h3hk7pvc
Team from previous one presented at ICML
Team in January made one of the first Interp based Evals for LLMs
All works from this will go towards the AI Plans Alignment Plan—if you want to do extremely impactful alignment research I think this is one of the best events in the world.

Kabir Kumar 14 Oct 2025 13:12 UTC
4 points
0
in reply to: Taylor G. Lunt’s comment on: Katalina Hernandez’s Shortform
Huh. I think I’ve had at least 3. And done so myself for at least 2 people who refuted me

Kabir Kumar 14 Oct 2025 11:13 UTC
4 points
1
in reply to: Taylor G. Lunt’s comment on: Katalina Hernandez’s Shortform
Not really—the guy DMed me and we had a call. He wanted to learn more. Also, talked about meeting up in Washington, when I host an event there.

Kabir Kumar 13 Oct 2025 13:34 UTC
3 points
0
in reply to: Taylor G. Lunt’s comment on: Katalina Hernandez’s Shortform
yes. https://www.linkedin.com/feed/update/urn:li:activity:7381791560054546432/?commentUrn=urn%3Ali%3Acomment%3A(activity%3A7381791560054546432%2C7381807819039088640)&dashCommentUrn=urn%3Ali%3Afsd_comment%3A(7381807819039088640%2Curn%3Ali%3Aactivity%3A7381791560054546432)&dashReplyUrn=urn%3Ali%3Afsd_comment%3A(7381818125865943040%2Curn%3Ali%3Aactivity%3A7381791560054546432)&replyUrn=urn%3Ali%3Acomment%3A(activity%3A7381791560054546432%2C7381818125865943040)

Kabir Kumar 13 Oct 2025 8:08 UTC
5 points
1
in reply to: Taylor G. Lunt’s comment on: Katalina Hernandez’s Shortform
wonder if there would be any value in going into such less-enlightened spaces and fighting the good fight, debating people like a 2000s-era atheist. It seems to have mostly worked out for the atheists.
Not in a debatey way, but in an informative way, yes. Pretty easy too

Kabir Kumar 13 Oct 2025 8:06 UTC
1 point
0
in reply to: Viliam’s comment on: Kabir Kumar’s Shortform
We just dont talk at all atm. Not likely to change in the future tbh. He doesnt respond to my calls or texts.

Kabir Kumar 12 Oct 2025 3:36 UTC
4 points
3
on: Experiments With Sonnet 4.5′s Fiction
this is bleak news, thank you for sharing.