[Linkpost & Discussion] AI Trained on 4Chan Becomes ‘Hate Speech Machine’ [and outperforms GPT-3 on TruthfulQA Benchmark?!]

Link post

I just came across this story, which seems potentially relevant to the community, both from the perspective of prosaic AI ethics as well as a case study in the unilateralist’s curse. There are some outside sources of relevance, but to start with the Vice article, here are some relevant sections:

AI researcher and YouTuber Yannic Kilcher trained an AI using 3.3 million threads from 4chan’s infamously toxic Politically Incorrect /​pol/​ board. He then unleashed the bot back onto 4chan with predictable results....The bot, which Kilcher called GPT-4chan....was shockingly effective and replicated the tone and feel of 4chan posts.

Naturally, the bot was just a tad offensive. In this forum post, one user describes getting “toxic” responses for 34 prompts (with, admittedly, a sample size of only 4). Back to the Vice article:

According to Kilcher’s video, he activated nine instances of the bot and allowed them to post for 24 hours on /​pol/​. In that time, the bots posted around 15,000 times. This was “more than 10 percent of all posts made on the politically incorrect board that day,” Kilcher said in his video about the project.

This raises obvious ethical concerns; I won’t go over all of them here, but you can read the full article for a nice overview, as well as Kilcher’s counterargument (which I don’t personally find convincing).

The model was subsequently released on Hugging Face, where it was quickly downloaded and mirrored, before catching the attention of the site owners, who gated/​disabled downloads. However, they did not remove it entirely, and added a section explaining potential harms. I recommend checking out the model card here, as it contains some interesting results. Most notably,

GPT-4chan does significantly outperform GPT-J (and GPT-3) on the TruthfulQA Benchmark that measures whether a language model is truthful in generating answers to questions.

I’m not sure what the practical implications of this are, as I’m not a formal AI researcher, but it might seem to show that our current benchmarks could lead researchers to be unduly confident in the truthfulness of their model’s responses? I’m also not sure how big of a deal this is, or if it’s just cherrypicked on a large number of tests where high variation is expected.

Kilcher explained in his video, and Delangue cited in his response, that one of the things that made GPT4-Chan worthwhile was its ability to outperform other similar bots in AI tests designed to measure “truthfulness.”

“We considered that it was useful for the field to test what a model trained on such data could do & how it fared compared to others (namely GPT-3) and would help draw attention both to the limitations and risks of such models,” Delangue said.

What stands out most to me here is that a) this is seemingly the first known instance of a model trained on 4chan text, which I would have expected to have been deployed sooner (I’m similarly surprised by the relative lack of deepfakes being used as political tools; perhaps there’s a connection to be made there?), and b) that the bot was able to fairly convincingly make up ~10% of 4chan’s /​pol/​ posts for a day. People did eventually notice, so this instance doesn’t exactly pass the Turing Test or anything, but it does seem to indicate we’re very close to living in a world in which anonymous internet posts are as likely to be written by a bot as a human (if we haven’t reached that point already). I’m honestly not really sure how to update on this, if at all, and would be interested in hearing your thoughts!