Rohin Shah comments on AXRP Episode 10 - AI’s Future and Impacts with Katja Grace

Rohin Shah 28 Jul 2021 17:56 UTC
LW: 6 AF: 4
AF
Planned summary for the Alignment Newsletter:
This podcast goes over various strands of research from [AI Impacts](https://aiimpacts.org/), including lots of work that I either haven’t covered or have covered only briefly in this newsletter:
**AI Impacts’ methodology.** AI Impacts aims to advance the state of knowledge about AI and AI risk by recursively decomposing important high-level questions and claims into subquestions and subclaims, until reaching a question that can be relatively easily answered by gathering data. They generally aim to provide new facts or arguments that people haven’t considered before, rather than arguing about how existing arguments should be interpreted or weighted.
**Timelines.** AI Impacts is perhaps most famous for its [survey of AI experts](https://arxiv.org/abs/1705.08807) on timelines till high-level machine intelligence (HLMI). The author’s main takeaway is that people give very inconsistent answers and there are huge effects based on how you frame the question. For example:
1. If you estimate timelines by asking questions like “when will there be a 50% chance of HLMI”, you’ll get timelines a decade earlier than if you estimate by asking questions like “what is the chance of HLMI in 2030”.
2. If you ask about when AI will outperform humans at all tasks, you get an estimate of ~2061, but if you ask when all occupations will be automated, you get an estimate of ~2136.
3. People whose undergraduate studies were in Asia estimated ~2046, while those in North America estimated ~2090.
The survey also found that the median probability of outcomes approximately as bad as extinction was 5%, which the author found surprisingly high for people working in the field.
**Takeoff speeds.** A common disagreement in the AI alignment community is whether there will be a discontinuous “jump” in capabilities at some point. AI Impacts has three lines of work investigating this topic:
1. Checking how long it typically takes to go from “amateur human” to “expert human”. For example, it took about [3 years](https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-imagenet-image-classification/) for image classification on ImageNet, [38 years](https://aiimpacts.org/time-for-ai-to-cross-the-human-range-in-english-draughts/) on checkers, [21 years](https://aiimpacts.org/time-for-ai-to-cross-the-human-range-in-starcraft/) for StarCraft, [30 years](https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-go/) for Go, [30 years](https://aiimpacts.org/time-for-ai-to-cross-the-human-performance-range-in-chess/) for chess, and ~3000 years for clock stability (how well you can measure the passage of time).
2. Checking <@how often particular technologies have undergone discontinuities in the past@>(@Discontinuous progress in history: an update@). A (still uncertain) takeaway would be that discontinuities are the kind of thing that legitimately happen sometimes, but they don’t happen so frequently that you should expect them, and you should have a pretty low prior on a discontinuity happening at some specific level of progress.
3. Detailing [arguments](https://aiimpacts.org/likelihood-of-discontinuous-progress-around-the-development-of-agi/) for and against discontinuous progress in AI.
**Arguments for AI risk, and counterarguments.** The author has also spent some time thinking about how strong the arguments for AI risk are, and has focused on a few areas:
1. Will superhuman AI systems actually be able to far outpace humans, such that they could take over the world? In particular, it seems like humans can use non-agentic tools to help keep up.
2. Maybe the AI systems we build won’t have goals, and so the argument from [instrumental subgoals](https://selfawaresystems.files.wordpress.com/2008/01/ai_drives_final.pdf) won’t apply.
3. Even if the AI systems do have goals, they may have human-compatible goals (especially since people will be explicitly trying to do this).
4. The AI systems may not destroy everything: for example, they might instead simply trade with humans, and use their own resources to pursue their goals while leaving humans alone.