That is, I don’t believe human notions of goodness are so completely, utterly incoherent
The problem is that as a rational “utility function” things like human desires, or pain, must be defined down at the basic level of computational operations performed by human brains (and the ‘computational operations performed by something’ might itself not even be a definable concept).
Then there’s also ontology issue.
All the optimality guarantees for things like Solomonoff Induction are for predictions, not for the internal stuff inside the model—works great for pressing your button, not so much for determining what people exists and what they want.
For the same observable data, there’s the most probable theory, but there’s also a slightly more complex theory which has far more people at stake. Picture a rather small modification to the theory which multiple-invokes the original theory and makes an enormous number of people get killed depending on the number of anti-protons in this universe, or other such variable that the AI can influence. There’s a definite potential of getting, say, an antimatter maximizer or blackhole minimizer or something equally silly from a provably friendly AI that maximizes expected value over an ontology that has a subtle flaw. Proofs do not extend to checking the sanity of assumptions.
He did design the non-neural Goedel Machine to basically make a hard take-off happen. On purpose. He’s a man of immense chutzpah, and I mean that with all possible admiration.
To be honest, I just fail to be impressed with things such as AIXI or Goedel machine (which admittedly is cooler than the former).
I see as main obstacle to that kind of “neat AI” the reliance on extremely effective algorithms for things such as theorem proving (especially in the presence of logical uncertainty). Most people capable of doing such work would rather work on something that makes use of present and near future technologies. Things like Goedel machine seem to require far more power from the theorem prover than I would consider to be sufficient for the first person to create an AGI.
The problem is that as a rational “utility function” things like human desires, or pain, must be defined down at the basic level of computational operations performed by human brains (and the ‘computational operations performed by something’ might itself not even be a definable concept).
Then there’s also ontology issue.
All the optimality guarantees for things like Solomonoff Induction are for predictions, not for the internal stuff inside the model—works great for pressing your button, not so much for determining what people exists and what they want.
For the same observable data, there’s the most probable theory, but there’s also a slightly more complex theory which has far more people at stake. Picture a rather small modification to the theory which multiple-invokes the original theory and makes an enormous number of people get killed depending on the number of anti-protons in this universe, or other such variable that the AI can influence. There’s a definite potential of getting, say, an antimatter maximizer or blackhole minimizer or something equally silly from a provably friendly AI that maximizes expected value over an ontology that has a subtle flaw. Proofs do not extend to checking the sanity of assumptions.
To be honest, I just fail to be impressed with things such as AIXI or Goedel machine (which admittedly is cooler than the former).
I see as main obstacle to that kind of “neat AI” the reliance on extremely effective algorithms for things such as theorem proving (especially in the presence of logical uncertainty). Most people capable of doing such work would rather work on something that makes use of present and near future technologies. Things like Goedel machine seem to require far more power from the theorem prover than I would consider to be sufficient for the first person to create an AGI.