The Argument from Philosophical Difficulty

(I’m re­post­ing this com­ment as a top-level post, for ease of fu­ture refer­ence. The con­text here is a dis­cus­sion about the differ­ent lines of ar­gu­ments for the im­por­tance of AI safety.)

Here’s an­other ar­gu­ment that I’ve been push­ing since the early days (ap­par­ently not very suc­cess­fully since it didn’t make it to this list :) which might be called “ar­gu­ment from philo­soph­i­cal difficulty”. It ap­pears that achiev­ing a good long term fu­ture re­quires get­ting a lot of philo­soph­i­cal ques­tions right that are hard for us to an­swer. Given this, ini­tially I thought there are only three ways for AI to go right in this re­gard (as­sum­ing ev­ery­thing else goes well with the AI):

  1. We solve all the im­por­tant philo­soph­i­cal prob­lems ahead of time and pro­gram the solu­tions into the AI.

  2. We solve metaphilos­o­phy (i.e., un­der­stand philo­soph­i­cal rea­son­ing as well as we un­der­stand math­e­mat­i­cal rea­son­ing) and pro­gram that into the AI so it can solve philo­soph­i­cal prob­lems on its own.

  3. We pro­gram the AI to learn philo­soph­i­cal rea­son­ing from hu­mans or use hu­man simu­la­tions to solve philo­soph­i­cal prob­lems.

Since then peo­ple have come up with a cou­ple more sce­nar­ios (which did make me slightly more op­ti­mistic about this prob­lem):

  1. We all co­or­di­nate to stop tech­nolog­i­cal progress some time af­ter AI but be­fore space coloniza­tion, and have a pe­riod of long re­flec­tion where hu­mans, maybe with help from AIs, spend thou­sands or mil­lions of years to solve philo­soph­i­cal prob­lems.

  2. We pro­gram AIs to be cor­rigible to their users, some users care about get­ting philos­o­phy cor­rect so the AIs help keep them safe and get their “fair share” of the uni­verse un­til philo­soph­i­cal prob­lems are solved even­tu­ally, enough users care about this so that we end up with a mostly good fu­ture, and lack of philo­soph­i­cal knowl­edge doesn’t cause dis­aster in the mean­time. (My writ­ings on “hu­man safety prob­lems” were in part a re­sponse to this sug­ges­tion, out­lin­ing how hard it would be to keep hu­mans “safe” in this sce­nario.)

The over­all ar­gu­ment is that, given hu­man safety prob­lems, re­al­is­tic com­pet­i­tive pres­sures, difficul­ties with co­or­di­na­tion, etc., it seems hard to end up in any of these sce­nar­ios and not have some­thing go wrong along the way. Maybe an­other way to put this is, given philo­soph­i­cal difficul­ties, the tar­get we’d have to hit with AI is even smaller than it might oth­er­wise ap­pear.