RSS

gasteigerjo

Karma: 323

Working on Alignment Science at Anthropic

AI Safety at the Fron­tier: Paper High­lights of Jan­uary 2026

gasteigerjo3 Feb 2026 18:56 UTC
20 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights of De­cem­ber 2025

gasteigerjo14 Jan 2026 14:29 UTC
16 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

Towards train­ing-time miti­ga­tions for al­ign­ment fak­ing in RL

16 Dec 2025 21:01 UTC
33 points
1 comment5 min readLW link
(alignment.anthropic.com)

AI Safety at the Fron­tier: Paper High­lights of Novem­ber 2025

gasteigerjo2 Dec 2025 21:11 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights of Oc­to­ber 2025

gasteigerjo5 Nov 2025 13:39 UTC
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

Train­ing fails to elicit sub­tle rea­son­ing in cur­rent lan­guage models

9 Oct 2025 19:04 UTC
49 points
3 comments4 min readLW link
(alignment.anthropic.com)

AI Safety at the Fron­tier: Paper High­lights, Septem­ber ’25

gasteigerjo1 Oct 2025 16:24 UTC
11 points
0 comments6 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Au­gust ’25

gasteigerjo2 Sep 2025 20:29 UTC
12 points
0 comments7 min readLW link
(open.substack.com)

AI Safety at the Fron­tier: Paper High­lights, July ’25

gasteigerjo10 Aug 2025 12:49 UTC
7 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, June ’25

gasteigerjo7 Jul 2025 18:17 UTC
4 points
0 comments7 min readLW link
(open.substack.com)

AI Safety at the Fron­tier: Paper High­lights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, April ’25

gasteigerjo6 May 2025 14:22 UTC
4 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, March ’25

gasteigerjo7 Apr 2025 20:17 UTC
9 points
0 comments9 min readLW link
(aisafetyfrontier.substack.com)

Au­to­mated Re­searchers Can Subtly Sandbag

26 Mar 2025 19:13 UTC
44 points
0 comments4 min readLW link
(alignment.anthropic.com)

AI Safety at the Fron­tier: Paper High­lights, Fe­bru­ary ’25

gasteigerjo3 Mar 2025 22:09 UTC
7 points
0 comments7 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Jan­uary ’25

gasteigerjo11 Feb 2025 16:14 UTC
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, De­cem­ber ’24

gasteigerjo11 Jan 2025 22:54 UTC
7 points
2 comments7 min readLW link
(aisafetyfrontier.substack.com)

AI Safety at the Fron­tier: Paper High­lights, Novem­ber ’24

gasteigerjo7 Dec 2024 19:15 UTC
7 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)