Rohin Shah comments on The Self-Unaware AI Oracle

Rohin Shah 23 Jul 2019 16:24 UTC
8 points
Your argument about Solomonoff induction is clever but I feel like it’s missing the point.
I agree it’s missing the point. I do get the point, and I disagree with it—I wanted to say “all three cases will build self-models”; I couldn’t because that may not be true for Solomonoff induction due to an unrelated reason which as you note misses the point. I did claim that the other two cases would be self-aware as you define it.
(I agree that Solomonoff induction might build an approximate model of itself, idk.)
Maybe if we do it right, the best model would not be self-reflective, not knowing what it was doing as it did its predictive thing, and thus unable to reason about its internal processes or recognize causal connections between that and the world it sees (even if such connections are blatant).
My claim is that we have no idea how to do this, and I think the examples in your post would not do this.
One intuition is: An oracle is supposed to just answer questions. It’s not supposed to think through how its outputs will ultimately affect the world. So, one way of ensuring that it does what it’s supposed to do, is to design the oracle to not know that it is a thing that can affect the world.
I’m not disagreeing that if we could build a self-unaware oracle then we would be safe. That seems reasonably likely to fix agency issues (though I’d want to think about it more). My disagreement is on the premise of the argument, i.e. can we build self-unaware oracles at all.
- Steven Byrnes 24 Jul 2019 19:02 UTC
  5 points
  Parent
  On further reflection, you’re right, the Solomonoff induction example is not obvious. I put a correction in my post, thanks again.
- Steven Byrnes 23 Jul 2019 18:04 UTC
  3 points
  Parent
  I think we’re on the same page! As I noted at the top, this is a brainstorming post, and I don’t think my definitions are quite right, or that my arguments are airtight. The feedback from you and others has been super-helpful, and I’m taking that forward as I search for more a rigorous version of this, if it exists!! :-)