RSS

CS 2881r

TagLast edit: 11 Sep 2025 17:17 UTC by habryka

CS 2881r is a class by @boazbarak on AI Safety and Alignment at Harvard.

This tag applies to all posts about that class, as well as posts created in the context of it, e.g. as part of student assignments.

[CS 2881r] Some Gen­er­al­iza­tions of Emer­gent Misalignment

Valerio Pepe14 Sep 2025 16:18 UTC
11 points
0 comments9 min readLW link

AI Safety course in­tro blog

boazbarak21 Jul 2025 2:35 UTC
16 points
0 comments1 min readLW link
(windowsontheory.org)

[CS 2881r] [Week 3] Ad­ver­sar­ial Ro­bust­ness, Jailbreaks, Prompt In­jec­tion, Security

egeckr27 Sep 2025 1:31 UTC
2 points
0 comments26 min readLW link

[CS2881r] Op­ti­miz­ing Prompts with Re­in­force­ment Learning

1 Oct 2025 14:02 UTC
1 point
0 comments5 min readLW link

Call for sug­ges­tions—AI safety course

boazbarak3 Jul 2025 14:30 UTC
53 points
23 comments1 min readLW link

[CS 2881r AI Safety] [Week 2] Modern LLM Training

jusyc26 Sep 2025 1:25 UTC
1 point
0 comments4 min readLW link

[CS 2881r AI Safety] [Week 1] Introduction

14 Sep 2025 19:52 UTC
15 points
0 comments13 min readLW link

Learn­ings from AI safety course so far

boazbarak27 Sep 2025 18:17 UTC
101 points
4 comments3 min readLW link
No comments.