RSS

Sodium

Karma: 52

Trying to get into alignment

247ca7912b6c1009065bade7c4ffbdb95ff4794b8dadaef41ba21238ef4af94b

(Non-de­cep­tive) Subop­ti­mal­ity Alignment

Sodium18 Oct 2023 2:07 UTC
3 points
1 comment8 min readLW link

Univer­sal and Trans­fer­able Ad­ver­sar­ial At­tacks on Aligned Lan­guage Models [pa­per link]

Sodium29 Jul 2023 3:21 UTC
16 points
0 comments1 min readLW link
(arxiv.org)

NYT: The Sur­pris­ing Thing A.I. Eng­ineers Will Tell You if You Let Them

Sodium17 Apr 2023 18:59 UTC
11 points
2 comments1 min readLW link
(www.nytimes.com)