ryan_greenblatt comments on The case for ensuring that powerful AIs are controlled

ryan_greenblatt 8 Feb 2024 19:35 UTC
2 points
0

Within the appendix on control methods you mention preventing exploration hacking. I think a broader point which belongs in the list is: “Improve capability elicitation techniques”.

Agreed, we think improving capability elicitation techniques will also improve our ability to prevent exploration hacking/sandbagging. So work trying to generally improve capability elicitation will be useful.

Preventing (basically intentional) sandbagging is just a special case of capabilities elicitation, but it is pretty much the only case which we care about for control.