Ollie J

Karma: 191

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

13 Jun 2024 10:04 UTC

84 points

10 comments2 min readLW link

(arxiv.org)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

8 Nov 2023 11:37 UTC

49 points

0 comments18 min readLW link

ChatGPT banned in Italy over privacy concerns

Ollie J31 Mar 2023 17:33 UTC

18 points

4 comments1 min readLW link

(www.bbc.co.uk)

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC

24 points

6 comments5 min readLW link