Running https://aiplans.org
Fulltime working on the alignment problem.
I’d also like some citation for this, please.
My best guess is that the usual ratio of “time it takes to write a critical comment” to “time it takes to respond to it to a level that will broadly be accepted well” is about 5x. This isn’t in itself a problem in an environment with lots of mutual trust and trade, but in an adversarial context it means that it’s easily possible to run a DDOS attack on basically any author whose contributions you do not like by just asking lots of questions, insinuating holes or potential missing considerations, and demanding a response, approximately independently of the quality of their writing.
For related musings see the Scott Alexander classic Beware Isolated Demands For Rigor.
Not strictly related to this post, but I’m glad you know this and it makes me more confident in the future health of Lesswrong as a discussion place.
I think this is easier to anonymize, with the exception of very specific things that people become famous for.
A solution I’ve come around to for this is retroactive funding. As in, if someone did something essentially without funding, that resulted in outcomes, which if you knew were guaranteed, you would have funded/donated to the project, then donate to the person to encourage them to do it more.
my mum said to my little sister to take a break from her practice test for the eleven plus and come eat dinner, in the kitchen, with the rest of the family, my gran me and her (dad is upsairs, he’s rarely here for family dinner). my little sister, in a trembling voice said ‘but then dad will say ’
mum sharply says to leave it and come eat dinner. she leaves the living room where my little sister is, goes to the kitchen. my little sister tries to shut off the lights in the living room, when it stutters, beats her little hands onto it in frustration.
mum hears her in the kitchen, goes to the living room and sharply scolds her ‘what will happen if it breaks?? how much will it take to fix it!?’
in the kitchen, in hindi, which my little sister understands just a little bit, my nan lovingly calls my little sister her child, her love, her darling, invites her to sit across from her.
1 minute later, dad comes down. sees my little sister in the kitchen. starting shouting. “HOW DARE YOU LEAVE YOUR WORK”
“THIS IS YOUR RESPONSIBILITY”
he yells at my mum in hindi that “YOU LOT (referring to my mum and also to me) HAVE MADE THIS A JOKE”
“HOW MANY QUESTIONS HAVE YOU DONE”
“THIS IS NOT EVEN HALF”
“I LEFT HALF AN HOUR AGO”
“STOP!”
“STOP SHAKING LIKE A FISH!”
my sister is sobbing and crying
“STOP CRYING”
“YOU WILL NOT GET FOOD TODAY”
“NO! YOU! WILL! NOT! GET! FOOD! TODAY!”
he goes back upstairs.
my little sister is sobbing and crying in the living room, minutes later.
mum is in the kitchen. she knows that if she goes to console my little sister in the living room then my dad will come downstairs and shout again but worse.
she yells at my nan to be quiet.
i know that if i go to console my little sister, not only will my dad come downstairs again and shout worse and do worse, he’s likely to use this as the excuse to snap and kick me out of the house. and i’m broke, weak and powerless to do much other than write this and beg, hope that someone or that somehow i can change things. i can stop being broke and get a home, a house, where my little sister can be free.
just joined the call with one of the moonshot teams and i was actually basically an interruption, lol. felt so good to be completely unneeded there
if true, this should be much much more widely shared imo
announced it too late, underestimated how hard it would be to make the research guides, guaranteed personalized feedback to the first 300 applicants, also had the date for starting set too early, considering how much stuff was ready.
i messed things up a lot in organizing the moonshot program. working hard to make sure the future events will be much better. lots of things i can do.
For a noobie, why is this better than Deep RL or any of deep learning, or even needed at all? Just the limitations of scalable oversight and general problems with deep learning?
Is this the main thing thats proposed as a potential alternative to deep learning by agent foundations researchers??
the Kick off Call for the Moonshot Alignment Program will be starting in 8 minutes! https://discord.gg/QdF4Yd6Q?event=1405189459917537421
I think models like this should be evaluated and treated like drugs/casinos − 4o quite clearly causes addiction and that’s not something that should be completely profitable with 0 consequences, imo.
no. gpt5 is the cheap, extremely good writing model, imo. much better writer out there rn than any other model
eval to pay attention to:
gpt5 is for the gooners 🔥🔥
https://x.com/ficlive/status/1953870830644990461
Someone asked me about LLM self explaining and evaluating that earlier today, I made this while explaining what I’d look for in that kinda research.
Sharing because they found it useful and others might too.
Nice! Made a post similar to this for the motivation behind the AI Alignment Evals hackathons: https://aiplans.substack.com/p/making-the-progress-bars-for-ai-safety
Telling people in the moment that the things their emotions are telling them seem false, and perhaps their emotions convey some other information… is usually not the right move unless you’re very unusually good at making people feel seen while not not telling them they’re being an idiot.
you can just get good at this with practice
You’re unlikely to win the election, but it’ll likely shift the Overton window and give people hope that change is possible.
I think if it starts with a lot of market research, talking to people, listening to them, understanding what their problems are and which policies they’d most vote for, there’s actually quite a high chance of winning.
These are imperfect, I’d like feedback on them please:
https://moonshot-alignment-program.notion.site/Proposed-Research-Guides-255a2fee3c6780f68a59d07440e06d53?pvs=74