Adam Jermyn comments on Smoke without fire is scary

Adam Jermyn 5 Oct 2022 17:06 UTC
1 point
0
I’m imagining that you prompt the model with a scenario that incentivizes deception.
- Jon Garcia 5 Oct 2022 19:31 UTC
  1 point
  0
  Parent
  In that scenario, GPT would just simulate a human acting deceptively. It’s not actually trying to deceive you.
  
  I guess it could be informative to discover how sophisticated a deception scenario it is able to simulate, however.
  
  GPT-like models won’t be deceiving anyone in pursuit of their own goals. However, humans with nefarious objectives could still use them to construct deceptive narratives for purposes of social engineering.
  - Adam Jermyn 5 Oct 2022 20:42 UTC
    3 points
    1
    Parent
    Sorry I mean a prompt that incentivizes employing capabilities relevant for deceptive alignment. For instance, you might have a prompt that suggests to the model that it act non-myopically, or that tries to get at how much it understands about its own architecture, both of which seem relevant to implementing deceptive alignment.