Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Sodium
Karma:
86
Trying to get into alignment
247ca7912b6c1009065bade7c4ffbdb95ff4794b8dadaef41ba21238ef4af94b
All
Posts
Comments
New
Top
Old
Universal and Transferable Adversarial Attacks on Aligned Language Models [paper link]
Sodium
29 Jul 2023 3:21 UTC
16
points
0
comments
1
min read
LW
link
(arxiv.org)
NYT: The Surprising Thing A.I. Engineers Will Tell You if You Let Them
Sodium
17 Apr 2023 18:59 UTC
11
points
2
comments
1
min read
LW
link
(www.nytimes.com)
(Non-deceptive) Suboptimality Alignment
Sodium
18 Oct 2023 2:07 UTC
3
points
1
comment
9
min read
LW
link
Back to top