RSS

Jan Betley

Karma: 1,245

Was Barack Obama still serv­ing as pres­i­dent in De­cem­ber?

Jan Betley16 Sep 2025 11:18 UTC
115 points
14 comments6 min readLW link

Con­cept Poi­son­ing: Prob­ing LLMs with­out probes

5 Aug 2025 17:00 UTC
59 points
5 comments13 min readLW link

Back­door aware­ness and mis­al­igned per­sonas in rea­son­ing models

20 Jun 2025 23:38 UTC
34 points
8 comments6 min readLW link

OpenAI Re­sponses API changes mod­els’ behavior

11 Apr 2025 13:27 UTC
53 points
6 comments2 min readLW link

[Question] Are there any (semi-)de­tailed fu­ture sce­nar­ios where we win?

Jan Betley7 Apr 2025 19:13 UTC
15 points
3 comments1 min readLW link

Jan Betley’s Shortform

Jan Betley31 Mar 2025 14:02 UTC
5 points
38 comments1 min readLW link

Find­ing Emer­gent Misalignment

Jan Betley26 Mar 2025 17:33 UTC
26 points
0 comments3 min readLW link

Open prob­lems in emer­gent misalignment

1 Mar 2025 9:47 UTC
83 points
17 comments7 min readLW link

Emer­gent Misal­ign­ment: Nar­row fine­tun­ing can pro­duce broadly mis­al­igned LLMs

25 Feb 2025 17:39 UTC
332 points
92 comments4 min readLW link

Me, My­self, and AI: the Si­tu­a­tional Aware­ness Dataset (SAD) for LLMs

8 Jul 2024 22:24 UTC
109 points
37 comments5 min readLW link

Self-shut­down AI

Jan Betley21 Aug 2023 16:48 UTC
13 points
2 comments2 min readLW link

Lo­cal­iz­ing goal mis­gen­er­al­iza­tion in a maze-solv­ing policy network

Jan Betley6 Jul 2023 16:21 UTC
37 points
2 comments7 min readLW link

[Question] Re­v­erse en­g­ineer­ing of the simulation

Jan Betley7 Feb 2022 21:36 UTC
1 point
2 comments1 min readLW link

[Question] What do we *re­ally* ex­pect from a well-al­igned AI?

Jan Betley4 Jan 2021 20:57 UTC
13 points
10 comments1 min readLW link