JonahS comments on Tiling Agents for Self-Modifying AI (OPFAI #2)

JonahS 7 Jun 2013 0:26 UTC
3 points
0
To have any operationalization of how some specific model of self-modification provably maintains some invariant would be a large step forward, the existence of other models of self-modification nonwithstanding. Safety cannot be proven for all approaches, because not all approaches are safe.

My claim is that there are sufficiently many possible models for AI that given what we (the Less Wrong community, not necessarily AI researchers) know now, the probability of a given model being developed is tiny.

The actionable safety issues that would come up if the AI is like the Chinese economy would be very different from the actionable safety issues that would come up if the AI is like a self-improving chess playing program, which would be very different from the actionable safety issues that would come up if the AI is of the type that Eliezer’s publication describes.

Given the paucity of information available about the design of the first AI, I don’t think that the probability of doing safety research on a particular model being actionable is sufficiently high for such research to be warranted (relative to other available activities).

Even if that were so, that’s not MIRI’s (or EY’s) most salient comparative advantage (also: CFAR).
1. Eliezer made a major contribution to increasing rationality with his How To Actually Change Your Mind sequence, which improved the rationality of many people who I know, including myself.
2. MIRI could engage in other AI safety activities, such as improving future forecasting.
3. If an organization doesn’t have a cost-effective activity to engage in, and the employees recognize this, then they can leave and do something else. Here I’m not claiming that this is in fact the case of MIRI, rather, I’m just responding to your argument.
4. MIRI’s staff could migrate to CFAR.
5. Out of all of the high impact activities that MIRI staff could do, it’s not clear to me that Friendly AI research is their comparative advantage.
What links here?
- homunq 23 Jun 2013 2:35 UTC
  0 points
  0
  Parent
  Also, even if we accept that MIRI’s comparative advantage has to do with having a clearer view of the Friendliness vs. UnFriendliness distinction, why wouldn’t it be more effective for them to try to insure against an UnFriendly outcome by addressing the UnFriendliness already in the world today? For instance, corporate governance. Corporations’ optimization powers are a tremendous source of human happiness, but their UnFriendly tendencies are clear. For now, corporations have only parasitic intelligence, and don’t look particularly foomy, but if I had to bet on whether MIRI or Google/TenCent/Palantir/whatever was more likely to foom, there would be no contest.
  
  [There are a bunch of assumptions embedded there. The principal ones are:
  1. If a corporation, as currently constituted, somehow went foom it would be likely to be UnFriendly
  2. If we were able to make it so corporations appeared more Friendly in their day-to-day actions, they would also become less likely to rush headlong into an UnFriendly foom.
  I think 1 is pretty undeniable, but I could understand it if someone disagreed with 2.]