Tournesol, YouTube and AI Risk


Tournesol is a research project and an app aiming at building a large and varied database of preference judgements by experts on YouTube videos, in order to align YouTube’s recommendation algorithm towards videos according to different criteria, like scientific accuracy and entertainment value.

The researchers involved launched the website for participating last month, and hope to ratchet a lot of contributions by the end of the year, so that they have a usable and useful database of comparison between YouTube videos. For more details on the functioning of Tournesol, I recommend the video on the front page of the project, the white paper and this talk by one of the main researchers.

What I want to explore in this post is the relevance of Tournesol and the research around it to AI Alignment. Lê Nguyên Hoang, the main research on Tournesol, believes that it is very relevant. And whether or not he is right, I think the questions he raises should be discussed here in more detail.

This post focuses on AI Alignment, but there are also a lot of benefits to get from Tournesol on the more general problem of recommender systems and social media. To see how Tournesol should help solve these problems, see the white paper.

Thanks to Lê Nguyên Hoang and Jérémy Perret for feedback on this post.

AI Risk or Not AI Risk

There are two main ways to argue about Tournesol’s usefulness and importance for AI Alignment, depending on a central question: is YouTube’s algorithm a likely candidate for a short timeline AGI or not? So let’s start with it.

YouTube and Predict-O-Matic

Lê believes that YouTube’s algorithm has a high probability of reaching AGI level in the near future—something like the next ten years. While I’ve been updating to shorter timelines after seeing the GPT models and talking with Daniel Kokotajlo, I was initially rather dismissive of the idea that YouTube’s algorithm could become an AGI, and a dangerous one at that.

Now I’m less sure of how ridiculous it is. I’m still not putting as much probability as Lê does, but our discussion was one of the reasons I wanted to write such a post and have a public exchange about it.

So, in what way could YouTube’s algorithm reach an AGI level?

  • (Economic pressure) Recommending videos that are seen more and more is very profitable for YouTube (and its parent company Google). So there is an economic incentive to push the underlying model to be as good as possible at this task.

  • (Training Dataset) YouTube’s algorithm has access to all the content on YouTube. Which is an enormous quantity of data. Every minute, 500 hours of videos are uploaded to YouTube. And we all know that pretty much every human behavior can be found on YouTube.

  • (Available funding and researchers) YouTube, through its parent company Google, has access to huge ressources. So if reaching AGI depends only on building and running bigger models, the team working on YouTube’s recommender algorithm can definitely do it. See for example the recent trillion parameter language model of Google.

Hence if it’s feasible for YouTube’s algorithm to reach AGI level, there’s a risk it will do.

Then what? After all, YouTube is hardly a question-answerer for the most powerful and important people in the world. That was also my first reaction. But after thinking a bit more, I think YouTube’s recommendation algorithm might have similar issues as a Predict-O-Matic. Such a model is an oracle/​question-answerer, which will probably develop incentives for self-fulfilling prophecies and simplifying the system it’s trying to predict. Similarly, the objective of YouTube’s algorithm is on maximizing the time spent on videos, which could create the same kind of incentives.

One example of such behavior happening right now is the push towards more and more polarized political content, which in turns push people to look for such content, and thus is a self-fulfilling prophecy. It’s also relatively easy to adapt examples from Abram’s post with the current YouTube infrastructure: pushing towards more accurate financial recommendations by giving to a lot of people a video about how one stock is going to tank, making people sell it and thus tanking the stock.

I think the most important difference with the kind of Predict-O-Matic I usually have in mind is that a YouTube recommendation is a relatively weak output, that will probably not be taken at face value by many people with strong decision power. But this is compensated by the sheer reach of YouTube: There are 1-billion hours of watch-time per day for 2 billion humans, 70% of which result from recommendations (those are YouTube’s numbers, so to take with a grain of salt). Nudging many people towards something can be as effective or even more effective than strongly influencing a small number of decision-makers.

Therefore, the possibility of YouTube’s algorithm reaching AGI level and causing Predict-O-Matic type issues appear strong enough to at least entertain and discuss.

(Lê himself has a wiki page devoted to that idea, which differs from my presentation here)

What probability do you put on YouTube’s algorithm reaching AGI level?
Conditioning on YouTube’s algorithm reaching AGI level, what probability do you put on it showing Predict-O-Matic type problems?

Assuming the above risk about YouTube’s algorithm, Tournesol is the most direct project to attempt the alignment of this AI. It has thus value both for avoiding a catastrophe with this specific AI, but also for dealing with practical cases of alignment.

Useful Even Without a YouTube AGI

Maybe you’re not convinced by the previous section. One reason I find Tournesol exciting is that even then, it has value for AI Alignment research.

The most obvious one is the construction of a curated dataset to do value learning, of a scale that is unheard of. There are a lot of things one could do with access to this dataset: define a benchmark for value learning techniques, apply microscope AI to find a model of expert recommendation of valuable content.

Such experimental data also seems crucial if we want to understand better what is the influence of data on different alignment schemes. Examining an actual massive dataset, with directly helpful data but necessarily errors and attacks, might help design realistic assumptions to use when studying specific ML algorithms and alignment schemes.

What Can You Do To Help?

All of that is conditioned on the success of Tournesol. So what can you do to help?

  • If you’re a programmer: they are looking for React and Django programmers to work on the website and an Android app. For Lê, this is the most important point to reach a lot of people.

  • If you’re a student/​professor/​researcher: you can sign in for Tournesol with your institutional email address, which means your judgement will be added to the database when using Tournesol.

  • If you’re a researcher in AI Alignment: you can discuss this proposal and everything around it in the comments, so that the community doesn’t ignore this potential opportunity. There are also many open theoretical problems in the white paper. If you’re really excited about this project and want to collaborate, you can contact Lê by mail at


Tournesol aims at building a database of preference comparison between YouTube content, primarily in order to align YouTube’s algorithm. Even if there is no risk from the latter, such a database would have massive positive consequences for AI Alignment research.

I’m not saying that every researcher here should drop what they’re doing and go work on or around Tournesol. But the project seems sufficiently relevant to at least know and acknowledge, if nothing else. And if you can give a hand, or even criticize the proposal and discuss potential use of the database, I think you’ll be doing AI Alignment a service.