Rohin Shah mentions a paper arguing image classifiers vulnerable to adversarial examples are “picking up on real imperceptible features that do generalize to the test set, that humans can’t detect”. This might be the MIT paper Adversarial Examples are not Bugs, they are Features.
More links:
I googled ‘daniel ellsberg nuclear first strikes’ and found U.S. Planned Nuclear First Strike to Destroy Soviets and China – Daniel Ellsberg on RAI (6/13) and U.S. Refuses to Adopt a Nuclear Weapon No First Use Pledge – Daniel Ellsberg on RAI (7/13).
Rohin Shah mentions a paper arguing image classifiers vulnerable to adversarial examples are “picking up on real imperceptible features that do generalize to the test set, that humans can’t detect”. This might be the MIT paper Adversarial Examples are not Bugs, they are Features.
MIRI’s AI Risk for Computer Scientists workshop. Workshops are on hold due to COVID-19, but you’re welcome to apply, get in touch with us, etc.
That is in fact what I meant :)