StartAtTheEnd comments on Provably Safe AI

StartAtTheEnd 6 Oct 2023 15:43 UTC
−10 points
−1
I don’t like to be negative, but I’m almost completely certain that provable safe A.I. is impossible, by which I mean that it must contain a contradiction.
The problem is impossible already for humans, safety vs freedom, the problem of defining “good” and “evil”, giving somebody the ability to protect themselves while also making sure that they won’t harm others, having a system which prevents itself from its own corruption (protects itself against itself). And wouldn’t world domination look exactly the same as an AI enforcing peace and safety? These two would require the same amount of power and control, i.e. they’re basically the same thing seen from two different perspectives.
As far as I know, the Three Laws of Robotics aren’t perfect, nor has anyone yet been able to come up with a workable (by which I mean perfect/failproof) set of rules. There seems to be a contradiction here, as if you’re making a system which is stronger than itself. Even if you can prove that a system has a certain property, shouldn’t you first be able to define such a property?
It’s my belief that we have never come up with a good solution in any instances of this problem. Philosophical, moral, legal, political, you name it. Even the “paradox of tolerance”, an idea that I don’t consider very well thought out, seems to point at a fundemental problem. There’s no unique origin/root/beginning which we can rely on (base things on, compare things against), no universal system, there’s only relativity/perspectivism.
This generalizes. What’s the fundemental axioms of mathematics? No unique answer exists. What’s the beginning of everything? What’s the center of the universe? As usual, the question itself is wrong. This problem of alignment seems to attempt to answer a global problem from a local perspective, which is like solving a subjective problem objectively, or asking a formal system to prove itself, or expecting a human being to escape their own programming and biases. You could argue for a proof by construction, for if everything acts as one object then there can’t be any disagreements between objects. But even in individual people, there’s many internal conflicts going on, we’re not even aligned with ourselves. And even if we managed to steer humanity into a fixed point, I think we’d get stuck there forever, with a provable inability to exit again (which I think would make it unsafe). If you only bound the AI, you can’t prove that malicious actors can’t modify it and make it unsafe, for that would also mean that good actors wouldn’t be able to modify a malicious system and make it safe.
I realize that this looks like the comment of somebody who is mentally unwell, but I’m confident there’s a solid argument in there somewhere.