Language models are computer programs made to estimate the likelihood of a piece of text. “Hello, how are you?” is likely. “Hello, fnarg horses” is unlikely.

Language models can answer questions by estimating the likelihood of possible question-and-answer pairs, selecting the most likely question-and-answer pair. “Q: How are You? A: Very well, thank you” is a likely question-and-answer pair. “Q: How are You? A: Correct horse battery staple” is an unlikely question-and-answer pair.

The language models most relevant to AI safety are language models based on “deep learning”. Deep-learning-based language models can be “trained” to understand language better, by exposing them to text written by humans. There is a lot of human-written text on the internet, providing loads of training material.

Deep-learning-based language models are getting bigger and better trained. As the models become stronger, they get new skills. These skills include arithmetic, explaining jokes, programming, and solving math problems.

There is a potential risk of these models developing dangerous capabilities as they grow larger and better trained. What additional skills will they develop given a few years?

