HoldenKarnofsky comments on Reply to Holden on The Singularity Institute

HoldenKarnofsky 1 Aug 2012 14:16 UTC
17 points
I greatly appreciate the response to my post, particularly the highly thoughtful responses of Luke (original post), Eliezer, and many commenters.

Broad response to Luke’s and Eliezer’s points:

As I see it, there are a few possible visions of SI’s mission:
- M1. SI is attempting to create a team to build a “Friendly” AGI.
- M2. SI is developing “Friendliness theory,” which addresses how to develop a provably safe/useful/benign utility function without needing iterative/experimental development; this theory could be integrated into an AGI developed by another team, in order to ensure that its actions are beneficial.
- M3. SI is broadly committed to reducing AGI-related risks, and work on whatever will work toward that goal, including potentially M1 and M2.
My view is that the broader SI’s mission, the higher the bar should be for the overall impressiveness of the organization and team. An organization with a very narrow, specific mission—such as “analyzing how to develop a provably safe/useful/benign utility function without needing iterative/experimental development”—can, relatively easily, establish which other organizations (if any) are trying to provide what it does and what the relative qualifications are; it can set clear expectations for deliverables over time and be held accountable to them; its actions and outputs are relatively easy to criticize and debate. By contrast, an organization with broader aims and less clearly relevant deliverables—such as “broadly aiming to reduce risks from AGI, with activities currently focused on community-building”—is giving a donor (or evaluator) less to go on in terms of what the space looks like, what the specific qualifications are and what the specific deliverables are. In this case it becomes more important that a donor be highly confident in the exceptional effectiveness of the organization and team as a whole.

Many of the responses to my criticisms (points #1 and #4 in Eliezer’s response; “SI’s mission assumes a scenario that is far less conjunctive than it initially appears” and “SI’s goals and activities” section of Luke’s response) correctly point out that they have less force, as criticisms, when one views SI’s mission as relatively broad. However, I believe that evaluating SI by a broader mission raises the burden of affirmative arguments for SI’s impressiveness. The primary such arguments I see in the responses are in Luke’s list:

(1) The Sequences, the best tool I know for creating aspiring rationalists, (2) Harry Potter and the Methods of Rationality, a surprisingly successful tool for grabbing the attention of mathematicians and computer scientists around the world, and (3) the Singularity Summit, a mainstream-aimed conference that brings in people who end up making significant contributions to the movement — e.g. Tomer Kagan (an SI donor and board member) and David Chalmers (author of The Singularity: A Philosophical Analysis and The Singularity: A Reply).

I’ve been a consumer of all three of these, and while I’ve found them enjoyable, I don’t find them sufficient for the purpose at hand. Others may reach a different conclusion. And of course, I continue to follow SI’s progress, as I understand that it may submit more impressive achievements in the future.

Both Luke and Eliezer seem to disagree with the basic approach I’m taking here. They seem to believe that it is sufficient to establish that (a) AGI risk is an overwhelmingly important issue and that (b) SI compares favorably to other organizations that explicitly focus on this issue. For my part, I (a) disagree with the statement: “the loss in expected value resulting from an existential catastrophe is so enormous that the objective of reducing existential risks should be a dominant consideration whenever we act out of an impersonal concern for humankind as a whole”; (b) do not find Luke’s argument that AI, specifically, is the most important existential risk to be compelling (it discusses only how beneficial it would be to address the issue well, not how likely a donor is to be able to help do so); (c) believe it is appropriate to compare the overall organizational impressiveness of the Singularity Institute to that of all other donation-soliciting organizations, not just to that of other existential-risk- or AGI-focused organizations. I would guess that these disagreements, particularly (a) and (c), come down to relatively deep worldview differences (related to the debate over “Pascal’s Mugging”) that I will probably write more about in the future.

On tool AI:

Most of my disagreements with SI representatives seem to be over how broad a mission is appropriate for SI, and how high a standard SI as an organization should be held to. However, the debate over “tool AI” is different, with both sides making relatively strong claims. Here SI is putting forth a specific point as an underappreciated insight and thus as a potential contribution/accomplishment; my view is that SI’s suggested approach to AGI development is more dangerous than the “traditional” approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.

My latest thoughts on this disagreement were posted separately in a comment response to Eliezer’s post on the subject.

A few smaller points:
- I disagree with Luke’s claim that ” objection #1 punts to objection #2.” Objection #2 (regarding “tool AI”) points out one possible approach to AGI that I believe is both consonant with traditional software development and significantly safer than the approach advocated by SI. But even if the “tool AI” approach is not in fact safer, there may be safer approaches that SI hasn’t thought of. SI does not just emphasize the general problem that AGI may be dangerous (something that I believe is a fairly common view), but emphasizes a particular approach to AGI safety, one that seems to me to be highly dangerous. If SI’s approach is dangerous relative to other approaches that others are taking/advocating, or even approaches that have yet to be developed (and will be enabled by future tools and progress on AGI), this is a problem for SI.
- Luke states that rationality is “only a ceteris paribus predictor of success” and that it is a “weak one.” I wish to register that I believe rationality is a strong (though not perfect) predictor of success, within the population of people who are as privileged (in terms of having basic needs met, access to education, etc.) as most SI supporters/advocates/representatives. So while I understand that success is not part of the definition of rationality, I stand by my statement that it is “the best evidence of superior general rationality (or of insight into it).”
- Regarding donor-advised funds: opening an account with Vanguard, Schwab or Fidelity is a simple process, and I doubt any of these institutions would overrule a recommendation to donate to an organization such as SI (in any case, this is easily testable).
What links here?
- Thrasymachus's comment on Reply to Holden on The Singularity Institute by lukeprog (2 Aug 2012 16:38 UTC; 0 points)
- Wei Dai 2 Aug 2012 10:29 UTC
  18 points
  Parent
  
  My view is that the broader SI’s mission, the higher the bar should be for the overall impressiveness of the organization and team.
  
  Can you describe a hypothetical organization and some examples of the impressive achievements it might have, which would pass the bar for handling mission M3? What is your estimate of the probability of such an organization coming into existence in the next five or ten years, if a large fraction of current SI donors were to put their money into donor-advised funds instead?
  What links here?
  - lukeprog's comment on Reply to Holden on The Singularity Institute by lukeprog (3 Aug 2012 4:23 UTC; 9 points)
- DaFranker 1 Aug 2012 15:29 UTC
  11 points
  Parent
  I’m very much an outsider to this discussion, and by no means a “professional researcher”, but I believe those to be the primary reasons why I’m actually qualified to make the following point. I’m sure it’s been made before, but a rapid scan revealed no specific statement of this argument quite as directly and explicitly.
  
  HoldenKarnofsky: (...) my view is that SI’s suggested approach to AGI development is more dangerous than the “traditional” approach to software development, and thus that SI is advocating for an approach that would worsen risks from AGI.
  
  I’ve always understood SI’s position on this matter not as one of “We should not focus on building Tool AI! Fully reflectively self-modifying AGIs are the only way to go!”, but rather that it is extremely unlikely that we can prevent everyone else from building one.
  
  To my understanding, logic goes: If any programmer with relevant skills is sufficiently convinced, by whatever means and for whatever causes, that building a full traditional AGI is more efficient and will more “lazily” achieve his goals with less resources or achieve them faster, the programmer will build it whether you think it’s a good idea or not. As such, SI’s “Moral Imperative” is to account for this scenario as there is non-negligible probability of it actually happening, for if they do not, they effectively become hypocritical in claiming to work towards reducing existential AI risk.
  
  To reiterate with silly scare-formatting: It is completely irrelevant, in practice, what SI “advocates” or “promotes” as a preferred approach to building safe AI, because the probability that someone, somewhere, some day is going to use the worst possible approach is definitely non-negligible. If there is not already a sufficiently advanced Friendly AI in place to counter such a threat, we are then effectively defenseless.
  
  To metaphorize, this is a case of: “It doesn’t matter if you think only using remote-controlled battle robots would be a better way to resolve international disputes. At some point, someone somewhere is going to be convinced that killing all of you is going to be faster and cheaper and more certain of achieving their goals, so they’ll build one giant bomb and throw it at you without first making sure they won’t kill themselves in the process.”
  What links here?
  - John_Maxwell's comment on Cult impressions of Less Wrong/Singularity Institute by John_Maxwell (6 Aug 2012 22:02 UTC; 0 points)
  - John_Maxwell 3 Aug 2012 5:28 UTC
    7 points
    Parent
    This looks similar to this point Kaj Sotala made. My own restatement: As the body of narrow AI research devoted to making tools grows larger and larger, building agent AGI gets easier and easier, and there will always be a few Shane Legg types who are crazy enough to try it.
    
    I sometimes suspect that Holden’s true rejection to endorsing SI is that the optimal philanthropy movement is fringe enough already, and he doesn’t want to associate it with nutty-seeming beliefs related to near-inevitable doom from superintelligence. Sometimes I wish SI would market themselves as being similar to nuclear risk organizations like the Bulletin of Atomic Scientists. After all, EY was an AI researcher who quit and started working on Friendliness when he saw the risks, right? I think you could make a pretty good case for SI’s usefulness just working based on analogies from nuclear risk, without any mention of FOOM or astronomical waste or paperclip maximizers.
    
    Ideally we’d have wanted to know about nuclear weapon risks before having built them, not afterwards, right?
    - DaFranker 3 Aug 2012 12:58 UTC
      1 point
      Parent
      Personally, I highly doubt that to be Holden’s true rejection, though it is most likely one of the emotional considerations that cannot be ignored in a strategic perspective. Holden claims to have gone through most of the relevant LessWrong sequence and SIAI public presentation material, which makes the likelihood of a deceptive (or self-deceptive) argumentation lower, I believe.
      
      No, what I believe to be the real issue is that Holden and (Most of SIAI) have disagreements over many specific claims used to justify broader claims—if the specific claims are granted in principle, both seem to generally agree in good bayesian fashion on the broader or more general claim. Much of the disagreements on those specifics also appears to stem from different priors in ethical and moral values, as well as differences in their evaluations and models of human population behaviors and specific (but often unspecified) “best guess” probabilities.
      
      For a generalized example, one strong claim for existential risk being optimal effort is that even a minimal decrease in risk provides immense expected value simply from the sheer magnitude of what could most likely be achieved by humanity throughout the rest of its course of existence. Many experts and scientists outright reject this on the grounds that “future, intangible, merely hypothetical other humans” should not be assigned value on the same order-of-magnitude as current humans, or even one order of magnitude lower.
  - [deleted] 1 Aug 2012 16:01 UTC
    1 point
    Parent
    
    but rather that it is extremely unlikely that we can prevent everyone else from building one.
    
    Well, SI’s mission makes sense on the premise that the best way to prevent a badly built AGI from being developed or deployed is to build a friendly AGI which has that as one of its goals. ‘Best way’ here is a compromise between, on the one hand, the effectiveness of the FAI relative to other approaches, and on the other, the danger presented by the FAI itself as opposed to other approaches.
    
    So I think Holden’s position is that the ratio of danger vs. effectiveness does not weigh favorably for FAI as opposed to tool AI. So to argue against Holden, we would have to argue either that FAI will be less dangerous than he thinks, or that tool AI will be less effective than he thinks.
    
    I take it the latter is the more plausible.
    - DaFranker 2 Aug 2012 18:27 UTC
      2 points
      Parent
      Indeed, we would have to argue that to argue against Holden.
      
      My initial reaction was to counter this with a claim that we should not be arguing against anyone in the first place, but rather looking for probable truth (concentrate anticipations). And then I realized how stupid that was: Arguments Are Soldiers. If SI (and by the Blue vs Green principle, any SI-supporter) can’t even defend a few claims and defeat its opponents, it is obviously stupid and not worth paying attention to.
      
      SI needs some amount of support, yet support-maximization strategies carry a very high risk of introducing highly dangerous intellectual contamination through various forms (including self-reinforcing biases in the minds of researchers and future supporters) that could turn out to cause even more existential risk. Yet, at the same time, not gathering enough support quickly enough dramatically augments the risk that someone, somewhere, is going to trip on a power cable and poof, all humans are just gone.
      
      I am definitely not masterful enough in mathematics and bayescraft to calculate the optimal route through this differential probabilistic maze, but I suspect others could provide a very good estimate.
      
      Also, it’s very much worth noting that these very considerations, on a meta level, are an integral part of SI’s mission, so figuring out whether that premise you stated is true or not, and whether there are better solutions or not actually is SI’s objective. Basically, while I might understand some of the cognitive causes for it, I am still very much rationally confused when someone questions SI’s usefulness by questioning the efficiency of subgoal X, while SI’s original and (to my understanding) primary mission is precisely to calculate the efficiency of subgoal X.
- lukeprog 3 Aug 2012 4:23 UTC
  9 points
  Parent
  Just a few thoughts for now:
  - I agree that some of our disagreements “come down to relatively deep worldview differences (related to the debate over ‘Pascal’s Mugging’).” The forthcoming post on this subject by Steven Kaas may be a good place to engage further on this matter.
  - I retain the claim that Holden’s “objection #1 punts to objection #2.” For the moment, we seem to be talking past each other on this point. The reply Eliezer and I gave on Tool AI was not just that Tool AI has its own safety concerns, but also that understanding the tool AI approach and other possible approaches to the AGI safety problem are part of what an “FAI Programmer” does. We understand why people have gotten the impression that SI’s FAI team is specifically about building a “self-improving CEV-maximizing agent”, but that’s just one approach under consideration, and figuring out which approach is best requires the kind of expertise that SI aims to host.
  - The evidence suggesting that rationality is a weak predictor of success comes from studies on privileged Westerners. Perhaps Holden has a different notion of what counts as a measure of rationality than the ones currently used by psychologists?
  - I’ve looked further into donor advised funds and now agree that the institutions named by Holden are unlikely to overrule their client’s wishes.
  - I, too, would be curious to hear Holden’s response to Wei Dai’s question.
  - aaronsw 4 Aug 2012 11:18 UTC
    25 points
    Parent
    On the question of the impact of rationality, my guess is that:
    
    Luke, Holden, and most psychologists agree that rationality means something roughly like the ability to make optimal decisions given evidence and goals.
    
    The main strand of rationality research followed by both psychologists and LWers has been focused on fairly obvious cognitive biases. (For short, let’s call these “cognitive biases”.)
    
    Cognitive biases cause people to make choices that are most obviously irrational, but not most importantly irrational. For example, it’s very clear that spinning a wheel should not affect people’s estimates of how many African countries are in the UN. But do you know anyone for whom this sort of thing is really their biggest problem?
    
    Since cognitive biases are the primary focus of research into rationality, rationality tests mostly measure how good you are at avoiding them. These are the tests used in the studies psychologists have done on whether rationality predicts success.
    
    LW readers tend to be fairly good at avoiding cognitive biases (and will be even better if CFAR takes off).
    
    But there are a whole series of much more important irrationalities that LWers suffer from. (Let’s call them “practical biases” as opposed to “cognitive biases”, even though both are ultimately practical and cognitive.)
    
    Holden is unusually good at avoiding these sorts of practical biases. (I’ve found Ray Dalio’s “Principles”, written by Holden’s former employer, an interesting document on practical biases, although it also has a lot of stuff I disagree with or find silly.)
    
    Holden’s superiority at avoiding practical biases is a big part of why GiveWell has tended to be more successful than SIAI. (Givewell.org has around 30x the amount of traffic as Singularity.org according to Compete.com and my impression is that it moves several times as much money, although I can’t find a 2011 fundraising total for SIAI.)
    
    lukeprog has been better at avoiding practical biases than previous SIAI leadership and this is a big part of why SIAI is improving. (See, e.g., lukeprog’s debate with EY about simply reading Nonprofit Kit for Dummies.)
    
    Rationality, properly understood, is in fact a predictor of success. Perhaps if LWers used success as their metric (as opposed to getting better at avoiding obvious mistakes), they might focus on their most important irrationalities (instead of their most obvious ones), which would lead them to be more rational and more successful.
    What links here?
    Explicit and tacit rationality by lukeprog (9 Apr 2013 23:33 UTC; 61 points)
    aaronsw's comment on “Epiphany addiction” by cousin_it (4 Aug 2012 12:30 UTC; 0 points)
    - lukeprog 19 Feb 2013 4:03 UTC
      5 points
      Parent
      For the record, I basically agree with all this.
- John_Maxwell 3 Aug 2012 19:03 UTC
  6 points
  Parent
  
  I would guess that these disagreements, particularly (a) and (c), come down to relatively deep worldview differences (related to the debate over “Pascal’s Mugging”) that I will probably write more about in the future.
  
  How does Givewell plan to deal with the possibility that people who come to Givewell looking for charity advice may have a variety of worldviews that impact their thinking on this?