Thoughts on Ben Garfinkel’s “How sure are we about this AI stuff?”

I liked this talk by Ben.

I think it raises some very important points. OTTMH, I think the most important one is: We have no good critics. There is nobody I’m aware of who is seriously invested in knocking down AI-Xrisk arguments and qualified to do so. For many critics in machine learning (like Andrew Ng and Yann Lecun), the arguments seem obviously wrong or misguided, and so they do not think it’s worth their time to engage beyond stating that.

A related point which is also important is: We need to clarify and strengthen the case for AI-Xrisk. Personally, I think I have a very good internal map of the path arguments about AI-Xrisk can take, and the type of objections one encounters. It would be good to have this as some form of flow-chart. Let me know if you’re interested in helping make one.

Regarding machine learning, I think he made some very good points about how the the way ML works doesn’t fit with the paperclip story. I think it’s worth exploring the disanalogies more and seeing how that affects various Xrisk arguments.

As I reflect on what’s missing from the conversation, I always feel the need to make sure it hasn’t been covered in Superintelligence. When I read it several years ago, I found Superintelligence to be remarkably thorough. For example, I’d like to point out that FOOM isn’t necessary for a unilateral AI-takeover, since an AI could be progressing gradually in a box, and then break out of the box already superintelligent; I don’t remember if Bostrom discussed that.

The point about justification drift is quite apt. For instance, I think the case for MIRI’s veiwpoint increasingly relies on:

1) optimization daemons (aka “inner optimizers”)

2) adversarial examples (i.e. current ML systems seem to learn superficially similar but deeply flawed versions of our concepts)

TBC, I think these are quite good arguments, and I personally feel like I’ve come to appreciate them much more as well over the last several years. But I consider them far from conclusive, due to our current lack of knowledge/​understanding.

One thing I didn’t quite agree with in the talk: I think he makes a fairly general case against trying to impact the far future. I think the magnitude of impact and uncertainty we have about the direction of impact mostly cancel each other out, so even if we are highly uncertain about what effects our actions will have, it’s often still worth making guesses and using them to inform our decisions. He basically acknowledges this.