Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Teun van der Weij
Karma:
98
All
Posts
Comments
New
Top
Old
An Introduction to AI Sandbagging
Teun van der Weij
,
Felix Hofstätter
and
Francis Rhys Ward
26 Apr 2024 13:40 UTC
41
points
1
comment
8
min read
LW
link
Simple distribution approximation: When sampled 100 times, can language models yield 80% A and 20% B?
Teun van der Weij
,
Felix Hofstätter
and
Francis Rhys Ward
29 Jan 2024 0:24 UTC
39
points
5
comments
4
min read
LW
link
List of projects that seem impactful for AI Governance
JaimeRV
and
Teun van der Weij
14 Jan 2024 16:53 UTC
13
points
0
comments
13
min read
LW
link
Evaluating Language Model Behaviours for Shutdown Avoidance in Textual Scenarios
Simon Lermen
,
Teun van der Weij
and
Leon Lang
16 May 2023 10:53 UTC
22
points
0
comments
13
min read
LW
link
Back to top