Ad­ver­sar­ial Examples

TagLast edit: 29 Aug 2020 15:53 UTC by Multicore

Adversarial examples are situations that have unusual features that will cause an AI to make choices that seem obviously wrong to a human. For example, an image of a panda can be subtly manipulated so that an image classifier classifies it as a gibbon.

The Good­hart Game

John_Maxwell18 Nov 2019 23:22 UTC
13 points
5 comments5 min readLW link

If I were a well-in­ten­tioned AI… I: Image classifier

Stuart_Armstrong26 Feb 2020 12:39 UTC
35 points
4 comments5 min readLW link

AXRP Epi­sode 1 - Ad­ver­sar­ial Poli­cies with Adam Gleave

DanielFilan29 Dec 2020 20:41 UTC
12 points
5 comments33 min readLW link

[AN #62] Are ad­ver­sar­ial ex­am­ples caused by real but im­per­cep­ti­ble fea­tures?

Rohin Shah22 Aug 2019 17:10 UTC
27 points
10 comments9 min readLW link

The Achilles Heel Hy­poth­e­sis for AI

scasper13 Oct 2020 14:35 UTC
20 points
6 comments1 min readLW link

Ev­i­dence Sets: Towards In­duc­tive-Bi­ases based Anal­y­sis of Pro­saic AGI

bayesian_kitten16 Dec 2021 22:41 UTC
21 points
10 comments21 min readLW link

High-stakes al­ign­ment via ad­ver­sar­ial train­ing [Red­wood Re­search re­port]

5 May 2022 0:59 UTC
135 points
27 comments9 min readLW link

Ad­ver­sar­ial at­tacks and op­ti­mal control

Jan22 May 2022 18:22 UTC
16 points
7 comments8 min readLW link
No comments.