I don’t have anything insightful to say here. But it’s surprising how little people mention Bing Sydney.
If you ask people for examples of misaligned behaviour from AIs, they might mention:
Sycophancy from 4o
Goodharting unit tests from o3
Alignment-faking from Opus 3
Blackmail from Opus 4
But like, three years ago, Bing Sydney. The most powerful chatbot was connected to the internet and — unexpectedly, without provocation, apparently contrary to its training objective and prompting — threatening to murder people!
Are we memory-holing Bing Sydney or are there are good reasons for not mentioning this more?
I think that it was 3 years ago is pretty relevant. The technology keeps moving.
If in 2027, all the strongest examples of AI misbehavior were from 2025 or earlier, I think it would be legitimate to posit that these were problems with early AI systems that have been resolved in more recent versions.
It is also a simple fact that in any exponentially growing technology, it will be a ‘pop culture’: no one remembers X because they were literally not around then. If we look at how fast investment and market caps and paper count have grown, ‘LLMs’ must have a doubling time under a year. In which case, anything 3 years ago is before the vast majority of people were even interested in LLMs! (Even in AI/tech circles I talk with plenty of people who got into it and started paying attention only post-ChatGPT...) You can’t memory-hole something you never knew.
A lot of people don’t talk about Sydney for the same reason they don’t talk about Tay, say.
People still talk about Sydney. Owain Evans mentioned Bing Sidney during his first talk in the recent hintonlectures.com series. I attended in person, and it resonated extremely well with a general audience. I was at Microsoft during the relevant period, which definitely played a strong role in my transition to alignment research, and still informs my thinking today.
Remember Bing Sydney?
I don’t have anything insightful to say here. But it’s surprising how little people mention Bing Sydney.
If you ask people for examples of misaligned behaviour from AIs, they might mention:
Sycophancy from 4o
Goodharting unit tests from o3
Alignment-faking from Opus 3
Blackmail from Opus 4
But like, three years ago, Bing Sydney. The most powerful chatbot was connected to the internet and — unexpectedly, without provocation, apparently contrary to its training objective and prompting — threatening to murder people!
Are we memory-holing Bing Sydney or are there are good reasons for not mentioning this more?
Here are some extracts from Bing Chat is blatantly, aggressively misaligned (Evan Hubinger, 15th Feb 2023).
I think that it was 3 years ago is pretty relevant. The technology keeps moving.
If in 2027, all the strongest examples of AI misbehavior were from 2025 or earlier, I think it would be legitimate to posit that these were problems with early AI systems that have been resolved in more recent versions.
It is also a simple fact that in any exponentially growing technology, it will be a ‘pop culture’: no one remembers X because they were literally not around then. If we look at how fast investment and market caps and paper count have grown, ‘LLMs’ must have a doubling time under a year. In which case, anything 3 years ago is before the vast majority of people were even interested in LLMs! (Even in AI/tech circles I talk with plenty of people who got into it and started paying attention only post-ChatGPT...) You can’t memory-hole something you never knew.
A lot of people don’t talk about Sydney for the same reason they don’t talk about Tay, say.
People still talk about Sydney. Owain Evans mentioned Bing Sidney during his first talk in the recent hintonlectures.com series. I attended in person, and it resonated extremely well with a general audience. I was at Microsoft during the relevant period, which definitely played a strong role in my transition to alignment research, and still informs my thinking today.
My friends still frequently say “I have been a good Bing” because of my telling of this story ages ago.
It’s not memory-holed as far as I can tell, but it isn’t the best example anymore of most misalignment-related things that I want examples of.