This is the first post in an Alignment Forum sequence explaining the approaches both MIRI and OpenAI staff believe are the most promising means of auditing the cognition of very complex machine learning models. We will be discussing each approach in turn, with a focus on how they differ from one another.
The goal of this series is to provide a more complete picture of the various options for auditing AI systems than has been provided so far by any single person or organization. The hope is that it will help people make better-informed decisions about which approach to pursue.
We have tried to keep our discussion as objective as possible, but we recognize that there may well be disagreements among us on some points. If you think we’ve made an error, please let us know!
If you’re interested in reading more about the history of AI research and development, see:
1. What Is Artificial Intelligence? (Wikipedia) 2. How Does Machine Learning Work? 3. How Can We Create Trustworthy AI?
The first question we need to answer is: what do we mean by “artificial intelligence”?
The term “artificial intelligence” has been used to refer to a surprisingly broad range of things. The three most common uses are:
The study of how to create machines that can perceive, think, and act in ways that are typically only possible for humans. The study of how to create machines that can learn, using data, in ways that are typically only possible for humans. The study of how to create machines that can reason and solve problems in ways that are typically only possible for humans.
In this sequence, we will focus on the third definition. We believe that the first two are much less important for the purpose of AI safety research, and that they are also much less tractable.
Why is it so important to focus on the third definition?
The third definition is important because, as we will discuss in later posts, it is the one that creates the most risk. It is also the one that is most difficult to research, and so it requires the most attention.
This was literally the first output, with no rerolls in the middle! (Although after posting it, I did some other trials which weren’t as good, so I did get lucky on the first one. Randomness parameter was set to 0.5.)
I cut it off there because the next paragraph just restated the previous one.
(sorry, couldn’t resist)
This is the first post in an Alignment Forum sequence explaining the approaches both MIRI and OpenAI staff believe are the most promising means of auditing the cognition of very complex machine learning models. We will be discussing each approach in turn, with a focus on how they differ from one another.
The goal of this series is to provide a more complete picture of the various options for auditing AI systems than has been provided so far by any single person or organization. The hope is that it will help people make better-informed decisions about which approach to pursue.
We have tried to keep our discussion as objective as possible, but we recognize that there may well be disagreements among us on some points. If you think we’ve made an error, please let us know!
If you’re interested in reading more about the history of AI research and development, see:
1. What Is Artificial Intelligence? (Wikipedia) 2. How Does Machine Learning Work? 3. How Can We Create Trustworthy AI?
The first question we need to answer is: what do we mean by “artificial intelligence”?
The term “artificial intelligence” has been used to refer to a surprisingly broad range of things. The three most common uses are:
The study of how to create machines that can perceive, think, and act in ways that are typically only possible for humans. The study of how to create machines that can learn, using data, in ways that are typically only possible for humans. The study of how to create machines that can reason and solve problems in ways that are typically only possible for humans.
In this sequence, we will focus on the third definition. We believe that the first two are much less important for the purpose of AI safety research, and that they are also much less tractable.
Why is it so important to focus on the third definition?
The third definition is important because, as we will discuss in later posts, it is the one that creates the most risk. It is also the one that is most difficult to research, and so it requires the most attention.
:D
How many samples did you prune through to get this? Did you do any re-rolls? What was your stopping procedure?
This was literally the first output, with no rerolls in the middle! (Although after posting it, I did some other trials which weren’t as good, so I did get lucky on the first one. Randomness parameter was set to 0.5.)
I cut it off there because the next paragraph just restated the previous one.