Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Ollie J
Karma:
186
All
Posts
Comments
New
Top
Old
[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
,
Felix Hofstätter
,
Ollie J
,
Sam F. Brown
and
Francis Rhys Ward
13 Jun 2024 10:04 UTC
84
points
10
comments
2
min read
LW
link
(arxiv.org)
Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models
Felix Hofstätter
,
Francis Rhys Ward
,
HarrietW
,
LAThomson
,
Ollie J
,
Patrik Bartak
and
Sam F. Brown
8 Nov 2023 11:37 UTC
49
points
0
comments
18
min read
LW
link
ChatGPT banned in Italy over privacy concerns
Ollie J
31 Mar 2023 17:33 UTC
18
points
4
comments
1
min read
LW
link
(www.bbc.co.uk)
Whisper’s Wild Implications
Ollie J
3 Jan 2023 12:17 UTC
19
points
6
comments
5
min read
LW
link
Back to top