Rohin Shah comments on AI Boxing for Hardware-bound agents (aka the China alignment problem)

Rohin Shah 8 May 2020 22:53 UTC
11 points
0
This was an interesting experience to read, because it started by saying (roughly) “AI alignment is useless” (and I work on AI alignment), yet I found myself agreeing with nearly every point, at least qualitatively if not quantitatively. (Less so for the “keeping the West ahead” section, where I’m more uncertain because I just don’t know much about geopolitics.)
I’d recommend reading through AN #80, which summarizes the views of four people (including three AI alignment researchers) who are optimistic about solving AI alignment. I think they’re pretty similar to the things you say here. I’d also recommend reading What failure looks like for an abstract description of the sorts of problems I am trying to avoid via AI alignment research. I’m also happy to answer questions like “what are you doing if not encoding human values”, “if you agree with these arguments why do you worry about AI x-risk”, etc.
Given that you emphasize hardware-bound agents: have you seen AI and Compute? A reasonably large fraction of the AI alignment community takes it quite seriously.
I’ll do my own version of your credences, though I should note I’m not actually offering to bet (I find it not worth the hassle). It’s more like “if the bets were automatically tracked and paid out, and also I ignore ‘sucker bet’ effects, what bets would I be willing to make?”
Foom vs Moof:
Conditioned on AGI by 2050, I’d bet 20:1 against foom the way you seem to be imagining it. Unconditionally, I’d bet 10:1 against foom. Maybe I’d accept a 1:1000 bet for foom?
When will AGI happen?
I’d bet at 1.5:1 odds that it won’t happen before 2030, and at 1:2 odds that it will happen before 2060.
AI-alignment:
The way you define it, I’d take a bet at 100:1 odds against having “aligned AI”. (Though I’d want to clarify that your definition means what I think it means first.)
Positive outcome to the singularity:
If “China develops AI” is sufficient to count as a non-positive outcome, then I’m not really sure what this phrase means, so I’ll abstain here.
Tesla vs Google:
This is one place where I disagree with you; I’d guess that Google has better technology (both now and in the future), though I could still see Tesla producing a mass-market self-driving car before Google because 1. it’s closer to their core business (Google plausibly produces a rideshare service instead) and 2. Google cares more about their brand. Still, I think I’d take your 5:1 bet.
- Logan Zoellner 9 May 2020 1:22 UTC
  3 points
  0
  Parent
  Given that you emphasize hardware-bound agents: have you seen AI and Compute? A reasonably large fraction of the AI alignment community takes it quite seriously.
  This trend is going to run into Moore’s law as an upper ceiling very soon (within a year, the line will require a year of the world’s most powerful computer). What do you predict will happen then?
  
  “what are you doing if not encoding human values”
  Interested in the answer to this, and how much it looks like/disagrees with my proposal: building free trade, respect for individual autonomy, and censorship resistance into the core infrastructure and social institutions our world runs on.
  - Rohin Shah 9 May 2020 22:47 UTC
    2 points
    0
    Parent
    What do you predict will happen then?
    I don’t know; I do expect the line to slow down, though I’m not sure when. (See e.g. here and here for other people’s analysis of this point.)
    Interested in the answer to this, and how much it looks like/disagrees with my proposal
    It’s of a different type signature than your proposal. I agree that “how should infrastructure and institutions be changed” is an important question; it’s just not what I focus on. I think that there is still a technical question that needs to be answered: how do you build AI systems that do what you want them to do?
    In particular, nearly all AI algorithms that have ever been developed assume a known goal / specification, and then figure out how to achieve that goal. If this were to continue all the way till superintelligent AI systems, I’d be very worried, because of convergent instrumental subgoals. I don’t think this will continue all the way to superintelligent AI systems, but that’s because I expect people (including myself) to figure out how to build AI systems in a different way so that they optimize for our goals instead of their own goals.
    Of course one way to do this would be to encode a perfect representation of human values into the system, but like you I think this is unlikely to work (see also Chapter 1 of the Value Learning sequence). I usually think of the goal as “figure out how to build an AI system that is trying to help us”, where part of helpful behavior is clarifying our preferences / values with us, ensuring that we have accurate information, etc. (See Clarifying AI Alignment and my comment on it.) Think of this as like trying to figure out how to embed the skills of a great personal assistant into an AI system.