Buck comments on Buck’s Shortform

Buck 31 Jul 2025 14:55 UTC
73 points
5
@ryan_greenblatt and I are going to try out recording a podcast together tomorrow, as an experiment in trying to express our ideas more cheaply. I’d love to hear if there are questions or topics you’d particularly like us to discuss.
- Neel Nanda 1 Aug 2025 9:18 UTC
  48 points
  9
  Parent
  Hype! A 15 min brainstorm
  What would you work on if not control? Bonus points for sketching out the next 5+ new research agendas you would pursue, in priority order, assuming each previous one stopped being neglected
  What is the field of ai safety messing up? Bonus: For (field) in {AI safety fields}: What are researchers in $field wrong about/making poor decisions about, in a way that significantly limits their impact?
  What are you most unhappy about with how the control field has grown and the other work happening elsewhere?
  What are some common beliefs by AI safety researchers about their domains of expertise that you disagree with (pick your favourite domain)?
  What beliefs inside Constellation have not percolated into the wider safety community but really should?
  What have you changed your mind about in the last 12 months?
  You say that you don’t think control will work indefinitely and that’s sufficiently capable models will break it. Can you make that more concrete? What kind of early warning signs could we observe? Will we know when we reach models capable enough that we can no longer trust control?
  If you were in charge of Anthropic what would you do?
  If you were David Sacks, what would you do?
  If you had had a hundred cracked mats scholars and $10,000 of compute each, what would you have them do?
  If I gave you billions of dollars and 100 top researchers at a Frontier lab, what would you do?
  I’m concerned that the safety community spends way too much energy on more meta things like control, evals, interpretability, etc. And has somewhat lost sight of solving the damn alignment problem. Takes? if you agree what do you think someone who wants to solve the alignment problem should actually be doing about it right now?
  What are examples of the safety questions that you think are important, and can likely be studied on models in the next 2 years but not on today’s publicly available frontier models? (0.5? 1? 5? Until the 6 months before AGI?)
  If you were wrong about a belief that you are currently over 50% on to do with safety, what do you predict it is and why?
  What model organisms would you be most excited to see people produce? (Ditto any other the open source work)
  What are some mistakes you predict many listeners are making? Bonus points for mistakes you think I personally am making
  What is the most positive true thing you have to say about the field of ambitious mechanistic interpretability
  What does redwood look for when hiring people, especially junior researchers?
  What kind of mid-career professionals would you be most excited to see switch to control. What about other areas of air safety?
  What should AGI lab safety researchers be doing differently to have a greater impact? Feel free to give a different answer per lab
- Ben Pace 31 Jul 2025 16:47 UTC
  18 points
  10
  Parent
  People often present their views as a static object, which paints a misleading picture of how they arrived at them and how confident they are in different parts, I would be more interested to hear about how they’ve changed for both of you over the course of your work at Redwood.
- JustisMills 31 Jul 2025 18:49 UTC
  10 points
  0
  Parent
  Thoughts on how the sort of hyperstition stuff mentioned in nostalgebraist’s “the void” intersects with AI control work.
- williawa 1 Aug 2025 18:34 UTC
  8 points
  0
  Parent
  I had this question about economic viability of neuralese models
  https://www.lesswrong.com/posts/PJaq4CDQ5d5QtjNRy/?commentId=YmyQqQqdei9C7pXR3
  I remember Ryan talking about it on the 80k hours podcast. I’d be interested in hearing the perspective more fleshed out. Also just legibility of CoT, how important is it in the overall picture. If people start using fully recurrent architectures tomorrow in all frontier models does p(doom) go from 10% to 90%, or is it a smaller update?
- Zach Stein-Perlman 31 Jul 2025 16:00 UTC
  8 points
  0
  Parent
  Control is about monitoring, right?
- Seth Herd 31 Jul 2025 16:00 UTC
  8 points
  2
  Parent
  You guys seem as tuned into the big picture as anyone. The big question we as a field need to answer is: what’s the strategy? What’s the route to success?
- Rauno Arike 31 Jul 2025 16:31 UTC
  7 points
  1
  Parent
  What probability would you put on recurrent neuralese architectures overtaking transformers within the next three years? What are the most important arguments swaying this probability one way or the other? (If you want a specific operationalization for answering this, I like the one proposed by Fabien Roger here, though I’d probably be more stringent on the text bottlenecks criterion, maybe requiring a text bottleneck after at most 10k rather than 100k opaque serial operations.)
- Thane Ruthenis 31 Jul 2025 21:28 UTC
  6 points
  3
  Parent
  I second @Seth Herd’s suggestion, I’m interested in your vision regarding how success would look like. Not just “here’s a list of some initiatives and research programs that should be helpful” or “here’s a possible optimistic scenario in which things go well, but which we don’t actually believe in”, but the sketch of an actual end-to-end plan around which you’d want people to coordinate. (Under the understanding that plans are worthless but planning is everything, of course.)
- Michaël Trazzi 1 Aug 2025 8:18 UTC
  5 points
  0
  Parent
  What’s your version of AI 2027 (aka most likely concrete scenario you imagine for the future), and how does control end up working out (or not working out) in different outcomes.
- asher 31 Jul 2025 18:04 UTC
  5 points
  0
  Parent
  I would be curious to hear you discuss what good, stable futures might look like and how they might be governed (mostly because I haven’t heard your takes on this before and it seems quite important)
- Oliver Daniels 1 Aug 2025 4:15 UTC
  3 points
  0
  Parent
  Thoughts on “alignment” proposals (i.e. reducing P(scheming))
- fakeanalyst 31 Jul 2025 18:33 UTC
  3 points
  0
  Parent
  The usefulness of interpretability research
- Nathaniel 31 Jul 2025 16:35 UTC
  3 points
  0
  Parent
  What do you think of the risk that control backfires by preventing warning shots?
- Tobias H 31 Jul 2025 15:50 UTC
  3 points
  0
  Parent
  What types of policy/governance research is most valuable for control? Are there specific topics you wish more people were working on?
- samuelshadrach 31 Jul 2025 20:31 UTC
  2 points
  0
  Parent
  Thoughts on encouraging more LWers like yourself to make more videos?
  
  I am sympathetic to Krashen’s input hypothesis as a way to onboard people to a new culture, and video may be faster at that than text.
- Guive 1 Aug 2025 22:22 UTC
  1 point
  0
  Parent
  What are your thoughts on Salib and Goldstein’s “AI Rights for Human Safety” proposal?
- anaguma 1 Aug 2025 15:22 UTC
  1 point
  0
  Parent
  What’s your P(doom)?