Technical AGI safety research outside AI
I think there are many questions whose answers would be useful for technical AGI safety research, but which will probably require expertise outside AI to answer. In this post I list 30 of them, divided into four categories. Feel free to get in touch if you’d like to discuss these questions and why I think they’re important in more detail. I personally think that making progress on the ones in the first category is particularly vital, and plausibly tractable for researchers from a wide range of academic backgrounds.
Studying and understanding safety problems
How strong are the economic or technological pressures towards building very general AI systems, as opposed to narrow ones? How plausible is the CAIS model of advanced AI capabilities arising from the combination of many narrow services?
What are the most compelling arguments for and against discontinuous versus continuous takeoffs? In particular, how should we think about the analogy from human evolution, and the scalability of intelligence with compute?
What are the tasks via which narrow AI is most likely to have a destabilising impact on society? What might cyber crime look like when many important jobs have been automated?
How plausible are safety concerns about economic dominance by influence-seeking agents, as well as structural loss of control scenarios? Can these be reformulated in terms of standard economic ideas, such as principal-agent problems and the effects of automation?
How can we make the concepts of agency and goal-directed behaviour more specific and useful in the context of AI (e.g. building on Dennett’s work on the intentional stance)? How do they relate to intelligence and the ability to generalise across widely different domains?
What are the strongest arguments that have been made about why advanced AI might pose an existential threat, stated as clearly as possible? How do the different claims relate to each other, and which inferences or assumptions are weakest?
Solving safety problems
What techniques used in studying animal brains and behaviour will be most helpful for analysing AI systems and their behaviour, particularly with the goal of rendering them interpretable?
What is the most important information about deployed AI that decision-makers will need to track, and how can we create interfaces which communicate this effectively, making it visible and salient?
What are the most effective ways to gather huge numbers of human judgments about potential AI behaviour, and how can we ensure that such data is high-quality?
How can we empirically test the debate and factored cognition hypotheses? How plausible are the assumptions about the decomposability of cognitive work via language which underlie debate and iterated distillation and amplification?
How can we distinguish between AIs helping us better understand what we want and AIs changing what we want (both as individuals and as a civilisation)? How easy is the latter to do; and how easy is it for us to identify?
Various questions in decision theory, logical uncertainty and game theory relevant to agent foundations.
How can we create secure containment and supervision protocols to use on AI, which are also robust to external interference?
What are the best communication channels for conveying goals to AI agents? In particular, which ones are most likely to incentivise optimisation of the goal specified through the channel, rather than modification of the communication channel itself?
How closely linked is the human motivational system to our intellectual capabilities—to what extent does the orthogonality thesis apply to human-like brains? What can we learn from the range of variation in human motivational systems (e.g. induced by brain disorders)?
What were the features of the human ancestral environment and evolutionary “training process” that contributed the most to our empathy and altruism? What are the analogues of these in our current AI training setups, and how can we increase them?
What are the features of our current cultural environments that contribute the most to altruistic and cooperative behaviour, and how can we replicate these while training AI?
Forecasting AI
What are the most likely pathways to AGI and the milestones and timelines involved?
How do our best systems so far compare to animals and humans, both in terms of performance and in terms of brain size? What do we know from animals about how cognitive abilities scale with brain size, learning time, environmental complexity, etc?
What are the economics and logistics of building microchips and datacenters? How will the availability of compute change under different demand scenarios?
In what ways is AI usefully analogous or disanalogous to the industrial revolution; electricity; and nuclear weapons?
How will the progression of narrow AI shape public and government opinions and narratives towards it, and how will that influence the directions of AI research?
Which tasks will there be most economic pressure to automate, and how much money might realistically be involved? What are the biggest social or legal barriers to automation?
What are the most salient features of the history of AI, and how should they affect our understanding of the field today?
Meta
How can we best grow the field of AI safety? See OpenPhil’s notes on the topic.
How can spread norms in favour of careful, robust testing and other safety measures in machine learning? What can we learn from other engineering disciplines with strict standards, such as aerospace engineering?
How can we create infrastructure to improve our ability to accurately predict future development of AI? What are the bottlenecks facing tools like Foretold.io and Metaculus, and preventing effective prediction markets from existing?
How can we best increase communication and coordination within the AI safety community? What are the major constraints that safety faces on sharing information (in particular ones which other fields don’t face), and how can we overcome them?
What norms and institutions should the field of AI safety import from other disciplines? Are there predictable problems that we will face as a research community, or systemic biases which are making us overlook things?
What are the biggest disagreements between safety researchers? What’s the distribution of opinions, and what are the key cruxes?
Particular thanks to Beth Barnes and a discussion group at the CHAI retreat for helping me compile this list.
- AI Alignment 2018-19 Review by 28 Jan 2020 2:19 UTC; 126 points) (
- [AN #78] Formalizing power and instrumental convergence, and the end-of-year AI safety charity comparison by 26 Dec 2019 1:10 UTC; 26 points) (
- 17 Mar 2020 17:10 UTC; 8 points) 's comment on AMA: Toby Ord, author of “The Precipice” and co-founder of the EA movement by (EA Forum;
[copying from my comment on the EA Forum x-post]
For reference, some other lists of AI safety problems that can be tackled by non-AI people:
Luke Muehlhauser’s big (but somewhat old) list: “How to study superintelligence strategy”
AI Impacts has made several lists of research problems
Wei Dai’s, “Problems in AI Alignment that philosophers could potentially contribute to”
Kaj Sotala’s case for the relevance of psychology/cog sci to AI safety (I would add that Ought is currently testing the feasibility of IDA/Debate by doing psychological research)
Also relevant is Geoffrey Irving and Amanda Askell’s “AI Safety Needs Social Scientists.”
I think systems engineering is a candidate for this, at least as far as the safety and meta sections go.
There is a program at MIT for expanding systems engineering to account for post-design variations in the environment, including specific reasoning about a broader notion of safety:
Systems Engineering Advancement Research Initiative
There was also a DARPA program for speeding up the delivery of new military vehicles, which seems to have the most direct applications to CAIS:
Systems Engineering and the META Program
Among other things, systems engineering has the virtue of making hardware an explicit feature of the model.