Rob Bensinger comments on Question: MIRI Corrigbility Agenda

Rob Bensinger 15 Mar 2019 5:36 UTC
14 points
0
The only major changes we’ve made to the MIRI research guide since mid-2015 are to replace Koller and Friedman’s Probabilistic Graphical Models with Pearl’s Probabilistic Inference; replace Rosen’s Discrete Mathematics with Lehman et al.‘s Mathematics for CS; add Taylor et al.’s “Alignment for Advanced Machine Learning Systems”, Wasserman’s All of Statistics, Shalev-Shwartz and Ben-David’s Understanding Machine Learning, and Yudkowsky’s Inadequate Equilibria; and remove the Global Catastrophic Risks anthology. So the guide is missing a lot of new material. I’ve now updated the guide to add the following note at the top:
This research guide has been only lightly updated since 2015. Our new recommendation for people who want to work on the AI alignment problem is:
1. If you have a computer science or software engineering background: Apply to attend our new workshops on AI risk and to work as an engineer at MIRI. For this purpose, you don’t need any prior familiarity with our research.
If you aren’t sure whether you’d be a good fit for an AI risk workshop, or for an engineer position, shoot us an email and we can talk about whether it makes sense.
You can find out more about our engineering program in our 2018 strategy update.
2. If you’d like to learn more about the problems we’re working on (regardless of your answer to the above): See “Embedded Agency” for an introduction to our agent foundations research, and see our Alignment Research Field Guide for general recommendations on how to get started in AI safety.
After checking out those two resources, you can use the links and references in “Embedded Agency” and on this page to learn more about the topics you want to drill down on. If you want a particular problem set to focus on, we suggest Scott Garrabrant’s “Fixed Point Exercises.”
If you want people to collaborate and discuss with, we suggest starting or joining a MIRIx group, posting on LessWrong, applying for our AI Risk for Computer Scientists workshops, or otherwise letting us know you’re out there.
- Rob Bensinger 15 Mar 2019 5:44 UTC
  5 points
  0
  Parent
  For corrigibility in particular, some good material that’s not discussed in “Embedded Agency” or the reading guide is Arbital’s Corrigibility and Problem of Fully Updated Deference articles.
  - Wei Dai 15 Mar 2019 19:00 UTC
    6 points
    0
    Parent
    Is Jessica Taylor’s A first look at the hard problem of corrigibility still a good reference or is it outdated?
    - Rob Bensinger 16 Mar 2019 14:31 UTC
      4 points
      0
      Parent
      I’d expect Jessica/Stuart/Scott/Abram/Sam/Tsvi to have a better sense of that than me. I didn’t spot any obvious signs that it’s no longer a good reference.
- Rob Bensinger 16 Mar 2019 18:28 UTC
  4 points
  0
  Parent
  I’ve now also highlighted Scott’s tip from “Fixed Point Exercises”:
  Sometimes people ask me what math they should study in order to get into agent foundations. My first answer is that I have found the introductory class in every subfield to be helpful, but I have found the later classes to be much less helpful. My second answer is to learn enough math to understand all fixed point theorems.
  These two answers are actually very similar. Fixed point theorems span all across mathematics, and are central to (my way of) thinking about agent foundations.