Hello,
I appreciate the thoughtful response. I plan to respond at greater length in the future, both to this post and to some other content posted by SI representatives and commenters. For now, I wanted to take a shot at clarifying the discussion of “tool-AI” by discussing AIXI. One of the the issues I’ve found with the debate over FAI in general is that I haven’t seen much in the way of formal precision about the challenge of Friendliness (I recognize that I have also provided little formal precision, though I feel the burden of formalization is on SI here). It occurred to me that AIXI might provide a good opportunity to have a more precise discussion, if in fact it is believed to represent a case of “a rare exception who specified his AGI in such unambiguous mathematical terms that he actually succeeded at realizing, after some discussion with SIAI personnel, that AIXI would kill off its users and seize control of its reward button.”
So here’s my characterization of how one might work toward a safe and useful version of AIXI, using the “tool-AI” framework, if one could in fact develop an efficient enough approximation of AIXI to qualify as a powerful AGI. Of course, this is just a rough outline of what I have in mind, but hopefully it adds some clarity to the discussion.
A. Write a program that
Computes an optimal policy, using some implementation of equation (20) on page 22 of http://www.hutter1.net/ai/aixigentle.pdf
“Prints” the policy in a human-readable format (using some fixed algorithm for “printing” that is not driven by a utility function)
Provides tools for answering user questions about the policy, i.e., “What will be its effect on ___?” (using some fixed algorithm for answering user questions that makes use of AIXI’s probability function, and is not driven by a utility function)
Does not contain any procedures for “implementing” the policy, only for displaying it and its implications in human-readable form
B. Run the program; examine its output using the tools described above (#2 and #3); if, upon such examination, the policy appears potentially destructive, continue tweaking the program (for example, by tweaking the utility it is selecting a policy to maximize) until the policy appears safe and desirable
C. Implement the policy using tools other than AIXI agent
D. Repeat (B) and (C) until one has confidence that the AIXI agent reliably produces safe and desirable policies, at which point more automation may be called for
My claim is that this approach would be superior to that of trying to develop “Friendliness theory” in advance of having any working AGI, because it would allow experiment- rather than theory-based development. Eliezer, I’m interested in your thoughts about my claim. Do you agree? If not, where is our disagreement?
Hi, here are the details of whom I spoke with and why:
I originally emailed Michael Vassar, letting him know I was going to be in the Bay Area and asking whether there was anyone appropriate for me to meet with. He set me up with Jasen Murray.
Justin Shovelain and an SIAI donor were also present when I spoke with Jasen. There may have been one or two others; I don’t recall.
After we met, I sent the notes to Jasen for review. He sent back comments and also asked me to run it by Amy Willey and Michael Vassar, who each provided some corrections via email that I incorporated.
A couple of other comments:
If SIAI wants to set up another room for more funding discussion, I’d be happy to do that and to post new notes.
In general, we’re always happy to post corrections or updates on any content we post, including how that content is framed and presented. The best way to get our attention is to email us at info@givewell.org
And a tangential comment/question for Louie: I do not understand why you link to my two LW posts using the anchor text you use. These posts are not about GiveWell’s process. They both argue that standard Bayesian inference indicates against the literal use of non-robust expected value estimates, particularly in “Pascal’s Mugging” type scenarios. Michael Vassar’s response to the first of these was that I was attacking a straw man. There are unresolved disagreements about some of the specific modeling assumptions and implications of these posts, but I don’t see any way in which they imply a “limited process” or “blinding to the possibility of SIAI’s being a good giving opportunity.” I do agree that SIAI hasn’t been a fit for our standard process (and is more suited to GiveWell Labs) but I don’t see anything in these posts that illustrates that—what do you have in mind here?