habryka comments on Announcing: OpenAI’s Alignment Research Blog

habryka 1 Dec 2025 20:21 UTC
57 points
57
I generally think blogging is a good way to communicate intellectual progress, so see this as a good development!
Some thoughts on your first blogpost:
At OpenAI, we research how we can develop and deploy increasingly capable AI, and in particular AI capable of recursive self-improvement (RSI)
My reaction: Wait, what, why? I guess it’s nice to be as direct, but it feels sad that this is written as the bottom line.
To be clear, I agree that given this being OpenAI’s stance it’s good to say it plainly! But I was hoping that at least the safety team would have the position that “we will try to determine whether there is any way to build RSI safely, and will strongly advocate for not doing so if we think it cannot be done safely”.
Like, a thing that feels particularly sad here is that I was assuming that figuring out whether this can be done safely, or studying that question, is one of the key responsibilities of the safety team. This is an update that it isn’t, which is sad (and IMO creates some responsibility for members of the safety team to express concern about that publicly, but IDK, OpenAI seems like in a messy state with regards to that kind of stuff).
- Boaz Barak 1 Dec 2025 21:45 UTC
  22 points
  2
  Parent
  Thank you for pointing this out! While OpenAI have been public about our plans to build an AI scientists, it is of course crucial that we do this safely, and if it is not possible to do it safely, we should not do it at all.
  
  We have written about this before:
  OpenAI is deeply committed to safety⁠, which we think of as the practice of enabling AI’s positive impacts by mitigating the negative ones. Although the potential upsides are enormous, we treat the risks of superintelligent systems as potentially catastrophic and believe that empirically⁠ studying⁠ safety⁠ and alignment⁠ can help global decisions, like whether the whole field should slow development to more carefully study these systems as we get closer to systems capable of recursive self-improvement. Obviously, no one should deploy superintelligent systems without being able to robustly align and control them, and this requires more technical work.
  but we should have mentioned this in the hello world post too. We now updated it with a link to this paragraph.
  - danm 4 Dec 2025 21:01 UTC
    2 points
    0
    Parent
    It might be productive to hash out a difference in thinking here as well. Perhaps a crux is: I (a safety researcher at OpenAI) think question (a) “is this path to RSI on track to be safe, based on what we see so far?” is likely amenable to empirical study. I think the best way to gather evidence on question (b) “is there any version of RSI that would be safe?” is to gather evidence on (a) for some particular approach.
    One could measure and attempt to fix safety issues along the path to RSI (an example of what OpenAI calls “iterative deployment”, applied to in this case an internal deployment). If attempted fixes didn’t seem to be working, one could continuously gather and show evidence to that effect to present the most compelling case. This has more risk surface than determining whether the endeavor is safe ahead of time, but seems more tractable and more to OpenAI safety researchers’ comparative advantage.
    Curious @habryka if there is a different approach to question (b) that you think safety researchers at OpenAI are well positioned to pursue instead or in parallel.