Buck comments on The “no sandbagging on checkable tasks” hypothesis

Buck 29 Sep 2023 3:02 UTC
LW: 3 AF: 2
0
AF
If the AI can create a fake solution that feels more real than the actual solution, I think the task isn’t checkable by Joe’s definition.
- Chris_Leong 18 Mar 2025 12:59 UTC
  LW: 2 AF: 1
  0
  AF Parent
  That would make the domain of checkable tasks rather small.
  
  That said, it may not matter depending on the capability you want to measure.
  If you want to make the AI hack a computer to turn the entire screen green and it skips a pixel so as to avoid completing the task, well it would have still demonstrated that it possesses the dangerous capability, so it has no reason to sandbag.
  
  On the other hand, if you are trying to see if it has a capability that you wish it use, it can still sandbag.