RSS

Teun van der Weij

Karma: 341

Research scientist at Apollo Research.

Stress Test­ing De­liber­a­tive Align­ment for Anti-Schem­ing Training

17 Sep 2025 16:59 UTC
124 points
13 comments1 min readLW link
(antischeming.ai)

How to miti­gate sandbagging

Teun van der Weij23 Mar 2025 17:19 UTC
30 points
0 comments8 min readLW link

Teun van der Weij’s Shortform

Teun van der Weij14 Mar 2025 3:54 UTC
3 points
1 comment1 min readLW link

The Elic­i­ta­tion Game: Eval­u­at­ing ca­pa­bil­ity elic­i­ta­tion techniques

27 Feb 2025 20:33 UTC
10 points
1 comment2 min readLW link

[Paper] AI Sand­bag­ging: Lan­guage Models can Strate­gi­cally Un­der­perform on Evaluations

13 Jun 2024 10:04 UTC
84 points
10 comments2 min readLW link
(arxiv.org)