Rob Bensinger comments on AI Alignment Podcast: An Overview of Technical AI Alignment in 2018 and 2019 with Buck Shlegeris and Rohin Shah

Rob Bensinger 22 Apr 2020 21:06 UTC
LW: 8 AF: 5
0
AF
More links:
- I googled ‘daniel ellsberg nuclear first strikes’ and found U.S. Planned Nuclear First Strike to Destroy Soviets and China – Daniel Ellsberg on RAI (6/13) and U.S. Refuses to Adopt a Nuclear Weapon No First Use Pledge – Daniel Ellsberg on RAI (7/13).
- Rohin Shah mentions a paper arguing image classifiers vulnerable to adversarial examples are “picking up on real imperceptible features that do generalize to the test set, that humans can’t detect”. This might be the MIT paper Adversarial Examples are not Bugs, they are Features.
- MIRI’s AI Risk for Computer Scientists workshop. Workshops are on hold due to COVID-19, but you’re welcome to apply, get in touch with us, etc.
- Rohin Shah 22 Apr 2020 21:50 UTC
  LW: 6 AF: 4
  0
  AF Parent
  This might be the MIT paper Adversarial Examples are not Bugs, they are Features.
  That is in fact what I meant :)