Chris_Leong comments on AI for AI safety

Chris_Leong 3 Apr 2025 4:24 UTC
LW: 1 AF: 1
2
AF
Great post. I think some of your frames add a lot of clarity and I really appreciated the diagrams.
One subset of AI for AI safety that I believe to be underrated is wise AI advisors^[1]. Some of the areas you’ve listed (coordination, helping with communication, improving epistemics) intersect with this, but I don’t believe that this exhausts the wisdom frame, especially since the first two were only mentioned in the context of capability restraint. You also mention civilizational wisdom as a component of backdrop capacity and I agree that this is a very diffuse factor. At the same time, a less diffuse intervention would be to increase the wisdom of specific actors.
You write: “If efforts to expand the safety range can’t benefit from this kind of labor in a comparable way… then absent large amounts of sustained capability restraint, it seems likely that we’ll quickly end up with AI systems too capable for us to control”.
I agree. In fact, a key reason why I think this is important is that we can’t afford to leave anything on the table.
One of the things I like about the approach of training AI advisors is that humans can compensate for weaknesses in the AI system. In other words, I’m introducing a third category of labour human-AI cybernetic systems/centaur labour. I think that it’s likely that this might widen the sweet spot, however, we have to make sure that we do this in a way that differentially benefits safety.
You do discuss the possibility of using AI to unlock enhanced human labour. It would also be possible to classify such centaur systems under this designation.
1. ^
  More broadly, I think there’s merit to the cyborgism approach even if some of the arguments is less compelling in light of recent capabilities advances.