RSS

Ad­ver­sar­ial Training

TagLast edit: 3 Jun 2022 1:30 UTC by Ruby

AXRP Epi­sode 17 - Train­ing for Very High Reli­a­bil­ity with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC
16 points
0 comments35 min readLW link

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC
109 points
29 comments13 min readLW link

Take­aways from our ro­bust in­jury clas­sifier pro­ject [Red­wood Re­search]

dmz17 Sep 2022 3:55 UTC
143 points
12 comments6 min readLW link1 review

Some thoughts on why ad­ver­sar­ial train­ing might be useful

Beth Barnes8 Dec 2021 1:28 UTC
9 points
6 comments3 min readLW link

Ad­ver­sar­ial train­ing, im­por­tance sam­pling, and anti-ad­ver­sar­ial train­ing for AI whistleblowing

Buck2 Jun 2022 23:48 UTC
37 points
0 comments3 min readLW link

Ad­ver­sar­ial Ro­bust­ness Could Help Prevent Catas­trophic Misuse

aogara11 Dec 2023 19:12 UTC
30 points
18 comments9 min readLW link

AI Safety 101 - Chap­ter 5.2 - Un­re­stricted Ad­ver­sar­ial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC
17 points
0 comments19 min readLW link

EIS IX: In­ter­pretabil­ity and Adversaries

scasper20 Feb 2023 18:25 UTC
30 points
7 comments8 min readLW link

Con­tin­u­ous Ad­ver­sar­ial Qual­ity As­surance: Ex­tend­ing RLHF and Con­sti­tu­tional AI

Benaya Koren8 Jul 2023 17:32 UTC
6 points
0 comments9 min readLW link

La­tent Ad­ver­sar­ial Training

Adam Jermyn29 Jun 2022 20:04 UTC
42 points
12 comments5 min readLW link

Over­sight Leagues: The Train­ing Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC
20 points
6 comments10 min readLW link

EIS XI: Mov­ing Forward

scasper22 Feb 2023 19:05 UTC
19 points
2 comments9 min readLW link

EIS XII: Sum­mary

scasper23 Feb 2023 17:45 UTC
17 points
0 comments6 min readLW link