Adversarial Training

TagLast edit: 3 Jun 2022 1:30 UTC by Ruby

Some thoughts on why adversarial training might be useful

Beth Barnes8 Dec 2021 1:28 UTC

9 points

6 comments3 min readLW link

Adversarial training, importance sampling, and anti-adversarial training for AI whistleblowing

Buck2 Jun 2022 23:48 UTC

37 points

0 comments3 min readLW link

Latent Adversarial Training

Adam Jermyn29 Jun 2022 20:04 UTC

42 points

12 comments5 min readLW link

AXRP Episode 17 - Training for Very High Reliability with Daniel Ziegler

DanielFilan21 Aug 2022 23:50 UTC

16 points

0 comments35 min readLW link

Oversight Leagues: The Training Game as a Feature

Paul Bricman9 Sep 2022 10:08 UTC

20 points

6 comments10 min readLW link

Takeaways from our robust injury classifier project [Redwood Research]

dmz17 Sep 2022 3:55 UTC

143 points

12 comments6 min readLW link 1 review

EIS IX: Interpretability and Adversaries

scasper20 Feb 2023 18:25 UTC

30 points

7 comments8 min readLW link

EIS XI: Moving Forward

scasper22 Feb 2023 19:05 UTC

19 points

2 comments9 min readLW link

EIS XII: Summary

scasper23 Feb 2023 17:45 UTC

17 points

0 comments6 min readLW link

Continuous Adversarial Quality Assurance: Extending RLHF and Constitutional AI

Benaya Koren8 Jul 2023 17:32 UTC

6 points

0 comments9 min readLW link

AI Safety 101 - Chapter 5.2 - Unrestricted Adversarial Training

Charbel-Raphaël31 Oct 2023 14:34 UTC

17 points

0 comments19 min readLW link

Deep Forgetting & Unlearning for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC

109 points

29 comments13 min readLW link

Adversarial Robustness Could Help Prevent Catastrophic Misuse

aogara11 Dec 2023 19:12 UTC

30 points

18 comments9 min readLW link

Ironing Out the Squiggles

Zack_M_Davis29 Apr 2024 16:13 UTC

140 points

33 comments11 min readLW link

No comments.