Martin Randall comments on Martin Randall’s Shortform

Martin Randall 12 Oct 2025 1:15 UTC
5 points
0
I used to think that AI models weren’t smart enough to sandbag. But less intelligent animals can sandbag—eg an animal who apparently can’t do something but is able to when it lets them escape, or access treats, or otherwise get outsized rewards. Presumably this occurs without an inner monologue or a strategic decision to sandbag. If so, AI models are already plausibly smart enough to sandbag in general, without it being detectable in chain-of-thought, and then perform better in high-value opportunities.
- kbear 15 Oct 2025 11:58 UTC
  3 points
  0
  Parent
  anecdotes: certain dogs learn to be uncomprehending, as it gives them license to stubbornly pursue goals they have been told off of. you can catch them in it, but you have to be wise.