Ad­ver­sar­ial Examples

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

The Good­hart Game

John_Maxwell
13 points
5 min read

If I were a well-in­ten­tioned AI… I: Image classifier

Stuart_Armstrong
35 points
5 min read

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilan
12 points
33 min read

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin Shah
27 points
9 min read

The Achilles Heel Hy­poth­e­sis for AI

scasper
20 points
1 min read

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kitten
21 points
21 min read

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

5 May 2022 0:59 UTC
135 points
9 min read

Ad­ver­sar­ial at­tacks and op­ti­mal control

Jan
16 points
8 min read
