TristanTrim comments on Lessons from a failed ambitious alignment program

TristanTrim 19 Dec 2025 1:05 UTC
1 point
0

engaging with the actual problem and not just the prosaic stuff

Sometime it would be cool to have a conversation about what you mean by this, because I feel the same way much of the time, but I also feel there’s so much going on it’s impossible to have a strong grasp on what everyone is working on.

Did you see the Shallow review of technical AI safety, 2025? Even just going through the “white-box” and “theory” sections I’m interested in has months worth of content if I were trying to understand it in reasonable depth.
- Kabir Kumar 19 Dec 2025 1:28 UTC
  2 points
  0
  Parent
  Sometime it would be cool to have a conversation about what you mean by this, because I feel the same way much of the time, but I also feel there’s so much going on it’s impossible to have a strong grasp on what everyone is working on.
  Yes! I’m writing a post on that today!! I want this to become something that people can read and fully understand the alignment problem as best as it’s currently known, without needing to read a single thing on lesswrong, arbital, etc. V lucky atm, I’m living with a bunch of experienced alignment researchers and learning a lot.
  it’s here: https://docs.google.com/document/d/1rzJTwXT5CvF2IVKwlf72P-56bUZfwqxWO9tB8GU05Fo/edit?tab=t.w7ardgvsnt5g
  also, happy to just have a call:
  
  Did you see the Shallow review of technical AI safety, 2025? Even just going through the “white-box” and “theory” sections I’m interested in has months worth of content if I were trying to understand it in reasonable depth.
  Not properly yet—saw that in the China section, Deepseek’s Speciale wasn’t mentioned, but it’s a safety review, tbf, not a capabilities review. I do like this project a lot in general—thinking of doing more critique-a-thons and reviving the peer review platform, so that we can have more thorough things.