Mikita Balesni comments on ryan_greenblatt’s Shortform

Mikita Balesni 15 May 2026 0:54 UTC
1 point
0
Do you have some concrete operationalization of takeover that fits these requirements and happens while the model is deployed at a lab (e.g. in a Claude Code/Codex harness)? (Such that it would be possible to take some real traffic and edit it minimally to make a realistic version of this.)
- ryan_greenblatt 15 May 2026 15:07 UTC
  2 points
  0
  Parent
  No. But it seems tractable to have better tests of whether AIs would do intermediate bad actions using this methodology.