RSS

rusheb

Karma: 157

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
87 points
0 comments7 min readLW link

A starter guide for evals

8 Jan 2024 18:24 UTC
44 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Scal­able And Trans­fer­able Black-Box Jailbreaks For Lan­guage Models Via Per­sona Modulation

7 Nov 2023 17:59 UTC
36 points
2 comments2 min readLW link
(arxiv.org)

Un­der­stand­ing mesa-op­ti­miza­tion us­ing toy models

7 May 2023 17:00 UTC
42 points
2 comments10 min readLW link