Welcome, new contributors!

Today is the day; we’re opening up this forum to allow contributions from more people! See our How to Contribute page for the details.

Now is a great time to say a bit about what the Intelligent Agent Foundations Forum is, and how it came about. The short answer is that the Machine Intelligence Research Institute (MIRI) helped build this forum in order to facilitate research discussion on the topics in its technical agenda and related subjects.

Many of the early users of this forum previously contributed to a closed email group on decision theory, or wrote relevant posts on the group blog Less Wrong. MIRI wanted to build a forum that could focus on these topics and support high-quality mathematical collaboration, while being transparent and allowing new contributors to find and join it directly.

Broadly speaking, the topics of this forum concern the difficulties of value alignment- the problem of how to ensure that machine intelligences of various levels adequately understand and pursue the goals that their developers actually intended, rather than getting stuck on some proxy for the real goal or failing in other unexpected (and possibly dangerous) ways. As these failure modes are more devastating the farther we advance in building machine intelligences, MIRI’s goal is to work today on the foundations of goal systems and architectures that would work even when the machine intelligence has general creative problem-solving ability beyond that of its developers, and has the ability to modify itself or build successors. (For still more related info on the motivations for this work, see the Future of Life Institute’s research priorities letter or Nick Bostrom’s recent book Superintelligence.)

In that context, there are many interesting problems that come up; here are several from MIRI’s technical agenda page:

  • Decision theory: One class of topics comes from the distortions that arise when an agent predicts its environment, including its own future actions or the predictions of other agents, and tries to make decisions based on those. The tools of classical game theory and decision theory begin to make substandard recommendations on Newcomblike problems, blackmail problems, and other topics in this domain, and formal models of decision theories have brought up entirely unexpected self-referential failure modes. This has spurred the development of some new mathematical models of decision theory and counterfactual reasoning. (MIRI research agenda paper on decision theory)

  • Logical uncertainty: In the classical formalism of Bayesian agents, the agent updates on new evidence in a way that makes use of all logical consequences. In any interesting universe (even, say, the theory of arithmetic), this is actually an impossible assumption. Any bounded reasoner must have a satisfactory way of dealing with hypotheses that may in fact be determined from the data, but which have not yet been deduced either way. There are some interesting and analogous models of coherent (or locally coherent) probability distributions on the theory of arithmetic. (MIRI research agenda paper on logical uncertainty)

  • Reflective world-models: The distinction between an agent and its environment is a fuzzy one. Performing an action in the environment (e.g. sabotaging one’s own hardware) can predictably affect the agent’s future inferential processes. Furthermore, there are some models of intelligence and learning in which the correct hypotheses about the agent itself are not accessible to the agent. In both cases, there has been some progress on building mathematical models of systems that represent themselves more sensibly. (MIRI research paper on reflective world-models)

  • Corrigibility: Many goal systems, if they can reason reflectively and strategically, will seek to preserve themselves (because otherwise, their current goal state will be less likely to be reached). This gives rise to a potential problem with communicating human value to a machine intelligence: if the developers make a mistake in doing so, the machine intelligence may seek ways to avoid being corrected. There are several models of this, and a few proposals. (MIRI research paper on corrigibility)

  • Self-trust and Vingean reflection: Informally, if an agent self-modifies to become better at problem-solving or inference, it should be able to trust that its modified self will be better at achieving its goals. As it turns out, there is a self-referential obstacle with simple models of this (akin to the fact that only inconsistent formal systems believe themselves to be consistent), and one method of fixing it results in the possibility of indefinitely deferred actions or deductions. (MIRI research paper on Vingean reflection)

  • Value Learning: Since human beings have not succeeded at specifying human values (citation: look at the lack of total philosophical consensus on ethics), we may in fact need the help of a machine intelligence itself to specify the values to a machine intelligence. This sort of “indirect normativity” presents its own interesting challenges. (MIRI research paper on value learning)

This is not an exhaustive list of topics or of progress! In the next few days, several forum contributors plan to consolidate the work and discussions already on this forum, and produce summary posts with links for each group of topics (including some not listed above).

But the list does help us to point out what we consider to be on-topic in this forum. Besides the topics mentioned there, other relevant subjects include groundwork for self-modifying agents, abstract properties of goal systems, tractable theoretical or computational models of the topics above, and anything else that is directly connected to MIRI’s research mission.

It’s important for us to keep the forum focused, though; there are other good places to talk about subjects that are more indirectly related to MIRI’s research mission, and the moderators here may close down discussions on subjects that aren’t a good fit for this mission. Some examples of subjects that we would consider off-topic (unless directly applied to a more relevant area) include general advances in artificial intelligence and machine learning, general mathematical logic, general philosophy of mind, general futurism, existential risks, effective altruism, human rationality, and non-technical philosophizing.

As Benja said in the original welcome post, the software is still fairly minimal and a little rough around the edges (though we do have LaTeX support). We hope to improve quickly! If you want to help us, the code is on GitHub. And if you find bugs, we hope you’ll let us know!

We look forward to your contributions!