Wei Dai comments on Disentangling arguments for the importance of AI safety

Wei Dai 23 Jan 2019 2:29 UTC
LW: 42 AF: 16
AF
Here’s another argument that I’ve been pushing since the early days (apparently not very successfully since it didn’t make it to this list :) which might be called “argument from philosophical difficulty”. It appears that achieving a good long term future requires getting a lot of philosophical questions right that are hard for us to answer. Given this, initially I thought there are only three ways for AI to go right in this regard (assuming everything else goes well with the AI):
1. We solve all the important philosophical problems ahead of time and program the solutions into the AI.
2. We solve metaphilosophy (i.e., understand philosophical reasoning as well as we understand mathematical reasoning) and program that into the AI so it can solve philosophical problems on its own.
3. We program the AI to learn philosophical reasoning from humans or use human simulations to solve philosophical problems.
Since then people have come up with a couple more scenarios (which did make me slightly more optimistic about this problem):
1. We all coordinate to stop technological progress some time after AI but before space colonization, and have a period of long reflection where humans, maybe with help from AIs, spend thousands or millions of years to solve philosophical problems.
2. We program AIs to be corrigible to their users, some users care about getting philosophy correct so the AIs help keep them safe and get their “fair share” of the universe until philosophical problems are solved eventually, enough users care about this so that we end up with a mostly good future, and lack of philosophical knowledge doesn’t cause disaster in the meantime. (My writings on “human safety problems” were in part a response to this suggestion, outlining how hard it would be to keep humans “safe” in this scenario.)
The overall argument is that, given human safety problems, realistic competitive pressures, difficulties with coordination, etc., it seems hard to end up in any of these scenarios and not have something go wrong along the way. Maybe another way to put this is, given philosophical difficulties, the target we’d have to hit with AI is even smaller than it might otherwise appear.
What links here?