ryan_greenblatt comments on sjadler’s Shortform

ryan_greenblatt 6 May 2026 0:43 UTC
5 points
0
This also applies to any sort of external attacker without necessarily requiring any collusion/coordination with the AI itself. (Though the AI itself doing it might make it easier for the AI to notice and exploit the issue.)

Another issue that shows up if you only have sufficient safeguards on internal deployment: the AI sets up an internal rogue deployment by calling to the external API or possibly even the external API of a differnet model provider. Like, the internal AI “misusing” publicly available AI APIs might be really helpful for evading monitoring by default.