RSS

JanWehner

Karma: 105

I’m a PhD student working on AI Safety. I’m thinking about how we can use interpretability techniques to make LLMs more safe.

Safety Cases Ex­plained: How to Ar­gue an AI is Safe

JanWehner2 Dec 2025 11:03 UTC
14 points
2 comments9 min readLW link

A Call for Bet­ter Risk Modelling

18 Nov 2025 9:08 UTC
19 points
0 comments4 min readLW link

Learn­ing from the Lud­dites: Im­pli­ca­tions for a mod­ern AI labour movement

JanWehner16 Oct 2025 17:11 UTC
12 points
0 comments8 min readLW link

Open Challenges in Rep­re­sen­ta­tion Engineering

3 Apr 2025 19:21 UTC
14 points
0 comments5 min readLW link

Saar­brücken Ger­many—ACX Mee­tups Every­where Fall 2024

JanWehner29 Aug 2024 18:37 UTC
2 points
0 comments1 min readLW link

An In­tro­duc­tion to Rep­re­sen­ta­tion Eng­ineer­ing—an ac­ti­va­tion-based paradigm for con­trol­ling LLMs

JanWehner14 Jul 2024 10:37 UTC
38 points
6 comments17 min readLW link

Im­mu­niza­tion against harm­ful fine-tun­ing attacks

6 Jun 2024 15:17 UTC
4 points
0 comments12 min readLW link

Train­ing-time do­main au­tho­riza­tion could be helpful for safety

25 May 2024 15:10 UTC
15 points
4 comments7 min readLW link

Data for IRL: What is needed to learn hu­man val­ues?

JanWehner3 Oct 2022 9:23 UTC
18 points
6 comments12 min readLW link

In­tro­duc­tion to Effec­tive Altru­ism: How to do good with your career

JanWehner7 Sep 2022 18:12 UTC
1 point
0 comments1 min readLW link