Kabir Kumar comments on Jankily controlling superintelligence

Kabir Kumar 27 Jun 2025 21:24 UTC
3 points
0
Can the extent of this ‘control’ be precisely and unambiguously measured?
- ryan_greenblatt 27 Jun 2025 23:46 UTC
  7 points
  2
  Parent
  No
  - Kabir Kumar 28 Jun 2025 0:58 UTC
    4 points
    0
    Parent
    How do we know if it’s working then?
    - ryan_greenblatt 28 Jun 2025 1:06 UTC
      13 points
      3
      Parent
      We won’t, but we can get a general sense of whether it might be doing something at all using a bunch of proxies like how robust and secure the system is to human attackers with much more time than the model has and trying to train the model to attack the defenses in a controlled setting.