Top-down research aimed at high assurance AGI tries to envision what we’ll need a high assurance AGI to do, and starts playing with toy models to see if they can help us build up insights into the general problem, even if we don’t know what an actual AGI implementation will look like. Past examples of top-down research of this sort in computer science more generally include:
Lampson’s original paper on the confinement problem (covert channels), which used abstract models to describe a problem that wasn’t detected in the wild for ~2 decades after the wrote the paper. Nevertheless this gave computer security researchers a head start on the problem, and the covert channel communication field is now pretty big and active. Details here.
Shor’s quantum algorithm for integer factorization (1994) showed, several decades before we’re likely to get a large-scale quantum computer, that (e.g.) the NSA could be capturing and storing strongly encrypted communications and could later break them with a QC. So if you want to guarantee your current communications will remain private in the future, you’ll want to work on post-quantum cryptography and use it.
Hutter’s AIXI is the first fully-specified model of “universal” intelligence. It’s incomputable, but there are computable variants, and indeed tractable variants that can play arcade games successfully. The nice thing about AIXI is that you can use it to concretely illustrate certain AGI safety problems we don’t yet know how to solve even with infinite computing power, which means we must be very confused indeed. Not all AGI safety problems will be solved by first finding an incomputable solution, but that is one common way to make progress. I say more about this in a forthcoming paper with Bill Hibbard to be published in CACM.
But now, here are some top-down research problems MIRI thinks might pay off later for AGI safety outcomes, some of which are within or on the borders of computer science:
Naturalized induction: “Build an algorithm for producing accurate generalizations and predictions from data sets, that treats itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. More broadly, design a workable reasoning method that allows the reasoner to treat itself as fully embedded in the world it’s reasoning about.” (Agents build with the agent-environment framework are effectively Cartesian dualists, which has safety implications.)
Better AI cooperation: How can we get powerful agents to cooperate with each other where feasible? One line of research on this is called “program equilibrium”: in a setup where agents can read each other’s source code, they can recognize each other for cooperation more often than would be the case in a standard Prisoner’s Dilemma. However, these approaches were brittle, and agents couldn’t recognize each other for cooperation if e.g. a variable name was different between them. We got around that problem via provability logic.
Tiling agents: Like Bolander and others, we study self-reflection in computational agents, though for us its because we’re thinking ahead to the point when we’ve got AGIs who want to improve their own abilities and we want to make sure they retain their original purposes as they rewrite their own code. We’ve built some toy models for this, and they run into nicely crisp Gödelian difficulties and then we throw a bunch of math at those difficulties and in some cases they kind of go away, and we hope this’ll lead to insight into the general challenge of self-reflective agents that don’t change their goals on self-modification round #412. See also the procrastination paradox and Fallenstein’s monster.
These are just a few examples: there are lots more. We aren’t happy yet with our descriptions of any of these problems, and we’re working with various people to explain ourselves better, and make it easier for people to understand what we’re talking about and why we’re working on these problems and not others. But nevertheless some people seem to grok what we’re doing, e.g. I pointed Nik Weaver to the tiling agents paper stuff and despite not having past familiarity with MIRI he just ran with it.
Continued...
Top-down research aimed at high assurance AGI tries to envision what we’ll need a high assurance AGI to do, and starts playing with toy models to see if they can help us build up insights into the general problem, even if we don’t know what an actual AGI implementation will look like. Past examples of top-down research of this sort in computer science more generally include:
Lampson’s original paper on the confinement problem (covert channels), which used abstract models to describe a problem that wasn’t detected in the wild for ~2 decades after the wrote the paper. Nevertheless this gave computer security researchers a head start on the problem, and the covert channel communication field is now pretty big and active. Details here.
Shor’s quantum algorithm for integer factorization (1994) showed, several decades before we’re likely to get a large-scale quantum computer, that (e.g.) the NSA could be capturing and storing strongly encrypted communications and could later break them with a QC. So if you want to guarantee your current communications will remain private in the future, you’ll want to work on post-quantum cryptography and use it.
Hutter’s AIXI is the first fully-specified model of “universal” intelligence. It’s incomputable, but there are computable variants, and indeed tractable variants that can play arcade games successfully. The nice thing about AIXI is that you can use it to concretely illustrate certain AGI safety problems we don’t yet know how to solve even with infinite computing power, which means we must be very confused indeed. Not all AGI safety problems will be solved by first finding an incomputable solution, but that is one common way to make progress. I say more about this in a forthcoming paper with Bill Hibbard to be published in CACM.
But now, here are some top-down research problems MIRI thinks might pay off later for AGI safety outcomes, some of which are within or on the borders of computer science:
Naturalized induction: “Build an algorithm for producing accurate generalizations and predictions from data sets, that treats itself, its data inputs, and its hypothesis outputs as reducible to its physical posits. More broadly, design a workable reasoning method that allows the reasoner to treat itself as fully embedded in the world it’s reasoning about.” (Agents build with the agent-environment framework are effectively Cartesian dualists, which has safety implications.)
Better AI cooperation: How can we get powerful agents to cooperate with each other where feasible? One line of research on this is called “program equilibrium”: in a setup where agents can read each other’s source code, they can recognize each other for cooperation more often than would be the case in a standard Prisoner’s Dilemma. However, these approaches were brittle, and agents couldn’t recognize each other for cooperation if e.g. a variable name was different between them. We got around that problem via provability logic.
Tiling agents: Like Bolander and others, we study self-reflection in computational agents, though for us its because we’re thinking ahead to the point when we’ve got AGIs who want to improve their own abilities and we want to make sure they retain their original purposes as they rewrite their own code. We’ve built some toy models for this, and they run into nicely crisp Gödelian difficulties and then we throw a bunch of math at those difficulties and in some cases they kind of go away, and we hope this’ll lead to insight into the general challenge of self-reflective agents that don’t change their goals on self-modification round #412. See also the procrastination paradox and Fallenstein’s monster.
Ontological crises in AI value systems.
These are just a few examples: there are lots more. We aren’t happy yet with our descriptions of any of these problems, and we’re working with various people to explain ourselves better, and make it easier for people to understand what we’re talking about and why we’re working on these problems and not others. But nevertheless some people seem to grok what we’re doing, e.g. I pointed Nik Weaver to the tiling agents paper stuff and despite not having past familiarity with MIRI he just ran with it.