Ollie J

Karma: 191

Ollie J 13 Jun 2024 12:15 UTC
2 points
0
in reply to: gw’s comment on: [Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Fixed, thanks for flagging

[Paper] AI Sandbagging: Language Models can Strategically Underperform on Evaluations

Teun van der Weij, Felix Hofstätter, Ollie J, Sam F. Brown and Francis Rhys Ward

13 Jun 2024 10:04 UTC

84 points

10 comments2 min readLW link

(arxiv.org)

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

8 Nov 2023 11:37 UTC

49 points

0 comments18 min readLW link

ChatGPT banned in Italy over privacy concerns

Ollie J31 Mar 2023 17:33 UTC

18 points

4 comments1 min readLW link

(www.bbc.co.uk)

Ollie J 24 Feb 2023 20:36 UTC
1 point
0
on: Meta “open sources” LMs competitive with Chinchilla, PaLM, and code-davinci-002 (Paper)
The link for the github repo is broken, it includes the comma at the end.

Whisper’s Wild Implications

Ollie J3 Jan 2023 12:17 UTC

24 points

6 comments5 min readLW link

Ollie J 23 Nov 2022 13:25 UTC
9 points
2
on: Human-level Diplomacy was my fire alarm
I wonder how it would update its strategies if you negotiated in an unorthodox way:
- “If you help me win, I will donate £5000 across various high-impact charities”
- “If you don’t help me win, I will kill somebody”

Ollie J 16 Jun 2022 14:14 UTC
29 points
0
on: Contra Hofstadter on GPT-3 Nonsense
There exist many articles like this littered throughout the internet, where authors perform surface-level analysis and ask GPT-3 some question (usually basic arithmetic), then point at the wrong answer and make some conclusion (“GPT-3 is clueless”). They almost never state the parameters of the used model or give the whole input prompt.
GPT-3 is very capable of saying “I don’t know” (or “yo be real”), but due to its training dataset it likely won’t say it on its own accord.
GPT-3 is not an oracle or some other kind of agent. GPT-3 is a simulator of such agents. To get GPT-3 to act as a truthful oracle, explicit instruction must be given in the input prompt to do so.

Ollie J 31 Mar 2022 8:46 UTC
6 points
0
on: Meta wants to use AI to write Wikipedia articles; I am Nervous™
I’m positive that as these language models become more accessible and powerful, their misuse will grow massively. However, I believe open sourcing is the best option here; having access to such model allows us to create accurate automatic classifiers that detect outputs from such models. Media websites (e.g. Wikipedia, Twitter) could include this classifier in their pipeline for submitting new media.
Making such technologies closed source leaves researchers in the dark; due to the scaling-transformer hype, only a tiny fraction of the world’s population have the financial means to train a SOTA transformer model.