RSS

Jérémy Scheurer

Karma: 677

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
92 points
15 comments5 min readLW link

Apollo Re­search 1-year update

29 May 2024 17:44 UTC
92 points
0 comments7 min readLW link

We need a Science of Evals

22 Jan 2024 20:30 UTC
66 points
13 comments9 min readLW link

A starter guide for evals

8 Jan 2024 18:24 UTC
44 points
2 comments12 min readLW link
(www.apolloresearch.ai)

Un­der­stand­ing strate­gic de­cep­tion and de­cep­tive alignment

25 Sep 2023 16:27 UTC
59 points
16 comments7 min readLW link
(www.apolloresearch.ai)

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
215 points
11 comments8 min readLW link

Imi­ta­tion Learn­ing from Lan­guage Feedback

30 Mar 2023 14:11 UTC
71 points
3 comments10 min readLW link

Prac­ti­cal Pit­falls of Causal Scrubbing

27 Mar 2023 7:47 UTC
87 points
17 comments13 min readLW link