Rob Bensinger comments on Living in an Inadequate World

Rob Bensinger 13 Nov 2017 18:26 UTC
5 points
0
The research directions that people at MIRI prioritize are already pretty heavily informed by work that was first developed or written up by people like Paul Christiano (at OpenAI), Stuart Armstrong (a MIRI research associate, primary affiliation FHI), Wei Dai, and others. MIRI researchers work on the things that look most promising to them, and those things get added to our agenda if they aren’t already on the agenda.
Different researchers at MIRI have different ideas about what’s most promising; e.g., the AAMLS agenda incorporated a lot of problems from Paul Christiano’s agenda, and reflected the new intuitions and inside-view models brought to the table by the researchers who joined MIRI in early and mid-2015.
I’m guessing our primary disagreement is about how promising various object-level research directions at MIRI are. It might also be that you’re thinking that there’s less back-and-forth between MIRI and researchers at other institutions than actually occurs, or more viewpoint uniformity at MIRI than actually exists. Or you might be thinking that people at MIRI working on similar research problems together reflects top-down decisions by Eliezer, rather than reflecting (a) people with similar methodologies and intuitions wanting to work together, and (b) convergence happening faster between people who share the same physical space.
In this case, I think some of the relevant methodology/intuition differences are on questions like:
- Are we currently confused on a fundamental level about what general-purpose good reasoning in physical environments is? Not just “how can we implement this in practice?”, but “what (at a sufficient level of precision) are we even talking about?”
- Can we become much less confused, and develop good models of how AGI systems decompose problems into subproblems, allocate cognitive resources to different subproblems, etc.?
- Is it a top priority for developers to go into large safety-critical software projects like this with as few fundamental confusions about what they’re doing as possible?
People who answer “yes” to those questions tend to cluster together and reach a lot of similar object-level conclusions, and people who answer “no” form other clusters. Resolving those very basic disagreements is therefore likely to be especially high-value.
I don’t think Newcombe-like dilemmas are relevant for the reasoning of potentially dangerous AIs
The primary reason to try to get a better understanding of realistic counterfactual reasoning (e.g., what an agent’s counterfactuals should look like in a decision problem) is that AGI is in large part about counterfactual reasoning. A generating methodology for a lot of MIRI researchers’ work is that we want to ensure the developers of early AGI systems aren’t “flying blind” with respect to how and why their systems work; we want developers to be able to anticipate the consequences of many design choices before they make them.
The idea isn’t that AGI techniques will look like decision theory, any more than they’ll look like probability theory. The idea is rather that it’s essential to have a basic understanding of what decision-making and probabilistic reasoning are before you build a general-purpose probabilistic reasoner and decision-maker. Newcomb’s problem is important in that context primarily because it’s one of the biggest anomalies in our current understanding of counterfactual reasoning. Zeroing in on anomalies in established theories and paradigms, and tugging on loose threads until we get a sense of why our theories break down at this particular point, is a pretty standard and productive approach in the sciences.
All that said, Newcomblike scenarios are ubiquitous in real life, and would probably be much more so for AGI systems. I’ll say more about this in a second comment.