Quintin Pope comments on Counterarguments to the basic AI x-risk case

Quintin Pope 14 Oct 2022 23:39 UTC
LW: 15 AF: 4
11
AF
Also, we don’t know what would happen if we exactly optimized an image to maximize the activation of a particular human’s face detection circuitry. I expect that the result would be pretty eldritch as well.
- cubefox 15 Oct 2022 10:20 UTC
  14 points
  1
  Parent
  We may be already doing that in case of cartoon faces with their exaggerated features. Cartoon faces don’t look eldritch to us, but why would they?
  - Rudi C 23 Oct 2022 2:22 UTC
    4 points
    3
    Parent
    They are still smooth and have low-frequency patterns, which seems to be the main difference from adversarial examples currently produced from DL models.
- TurnTrout 8 Nov 2022 18:13 UTC
  LW: 3 AF: 2
  1
  AF Parent
  Yeah. Wake me up when we find a single agent which makes decisions by extremizing its own concept activations. EG I’m pretty sure that people don’t reflectively, most strongly want to make friends with entities which maximally activate their potential-friend detection circuitry.