Once you do that, it’s a fact of the universe, that the programmers can’t change, that “you’d do better at these goals if you didn’t have to be fully obedient”, and while programmers can install various safeguards, those safeguards are pumping upstream and will have to pump harder and harder as the AI gets more intelligent. And if you want it to make at least as much progress as a decent AI researcher, it needs to be quite smart.
Is there a place where this whole hypothesis about deep laws of intelligence is connected to reality? Like, how hard they have to pump? What’s exactly the evidence that they will have to pump harder? Why “quite smart” point can’t be when safeguards still work? Right now it’s not different from saying “world is NP-hard, so ASI will have to try harder and harder to solve problems, and killing humanity is quite hard”.
If there were a natural shape for AIs that let you fix mistakes you made along the way, you might hope to find a simple mathematical reflection of that shape in toy models. All the difficulties that crop up in every corner when working with toy models are suggestive of difficulties that will crop up in real life; all the extra complications in the real world don’t make the problem easier.
If there were a natural shape for AIs that don’t wirehead, you might hope to find a simple mathematical reflection of that shape in toy models. So MIRI failing to find such a model means NNs are anti-natural. Again, what’s the justification for significant update from MIRI failing to find a mathematical model?
Is there a place where this whole hypothesis about deep laws of intelligence is connected to reality? Like, how hard they have to pump? What’s exactly the evidence that they will have to pump harder? Why “quite smart” point can’t be when safeguards still work? Right now it’s not different from saying “world is NP-hard, so ASI will have to try harder and harder to solve problems, and killing humanity is quite hard”.
If there were a natural shape for AIs that don’t wirehead, you might hope to find a simple mathematical reflection of that shape in toy models. So MIRI failing to find such a model means NNs are anti-natural. Again, what’s the justification for significant update from MIRI failing to find a mathematical model?