Your “The Doctrine of Logical Infallibility” is seems to be a twisted strawman. “no sanity checks” That part is kind of true. There will be sanity checks if and only if you decide to include them. Do you have a piece of code that’s a sanity check? What are we sanity checking and how do we tell if it’s sane? Do we sanity check the raw actions, that could be just making a network connection and sending encrypted files to various people across the internet. Do we sanity check the predicted results off these actions? Then the sanity checker would need to know how the results were stored, what kind of world is described by the binary data 100110...?
but if the system does come to a conclusion (perhaps with a degree-of-certainty number attached), the assumption seems to be that it will then be totally incapable of then allowing context to matter.
That’s because they are putting any extra parts that allow context to matter, and putting it in a big box and calling it the system. The systems decision are final and absolute, not because there are no double checks, but because the double checks are part of the system. Although at the moment, there is a lack of context adding algorithms, what you seem to want is a humanlike common sense.
The AI can sometimes execute a reasoning process, then come to a conclusion and then, when it is faced with empirical evidence that its conclusion may be unsound, it is incapable of considering the hypothesis that its own reasoning engine may not have taken it to a sensible place.
Again, at the moment, we have no algorithm for checking sensibleness, so any algorithm must go round in endless circles of self doubt and never do anything, or plow on regardless. Even if you do put 10% probability on the hypothesis that Humans don’t exist, your a fictional character in a story written by a mermaid, also the maths and science you know is entirely made up, there is no such thing as rationality or probability, what would you do? My best guess is that you would carry on breathing, eating and acting roughly like a normal human. You need a core of not totally insane for a sanity check to bootstrap.
But it gets worse. Those who assume the doctrine of logical infallibility often say that if the system comes to a conclusion, and if some humans (like the engineers who built the system) protest that there are manifest reasons to think that the reasoning that led to this conclusion was faulty, then there is a sense in which the AGI’s intransigence is correct, or appropriate, or perfectly consistent with “intelligence.”
There are designs of AI, files of programming code, that will hear your shouts, your screams, your protests of “thats not what I meant” and then kill you anyway. There are designs that will kill you with a super-weapon it invented itself, and then fill the universe with molecular smiley faces. This is not logically contradictory behavior. There exists pieces of code that will do this. You could argue that such code is a rare and complicated thing, that its nothing like any system that humans might try to build, that your less likely to write code that does this when trying to make a FAI than you are of writing a great novel when trying to write a shopping list. I would disagree, I would say that such behavior is the default, most simple AI designs don’t see screaming programmers as a reason to stop, because most AI designs see screaming humans as no more important or special than pissing rats. Its just another biological process that doesn’t seriously effect its ability to reach its goal. Most AI designs have no special reason to care about humans. It might know that the process of its creation involved humans, keyboards and a bunch of other objects, if you look back far enough the whole earth. It might know that if a hypothetical human was put in a room with the question “do you want a universe full of smiley faces?” and buttons labeled Yes and No, the human would press the no button. The AI thinks this is no more relevant than a hypothetical wombat being offered a choice between two types of cheese.
It will understand that many of its more abstract logical atoms have a less than clear denotation or extension in the world (if the AGI comes to a conclusion involving the atom [infelicity], say, can it then point to an instance of an infelicity and be sure that this is a true instance, given the impreciseness and subtlety of the concept?).
If the concept is too fuzzy, the AI can just discard it as useless. (eg soul, qualia) If it isn’t sure if something is a real instance (and an ideal agent will never be 100% sure of any real world fact), it can put a probability on it and use expected utility maximisation. But all that is part of the process of coming to a conclusion.
It will understand that knowledge can always be updated in the light of new information. Today’s true may be tomorrow’s false.
The AIXI formalism can do this. “My calendar clock says Tuesday on the front” is a fact that is true today and false tomorrow. AIXI “understands” this by simulating the clock and the rest of the universe in excessive detail. If you give it a quiz about what the clock will show when, and incentivize it to win, it will answer.
The other potential meaning is that it can accept that it was wrong and adapt. Suppose that over the last week, it has seen the sun moving and shadows changing from its camera. It assigns a 95% probability to “the sun goes round the earth”, You give it an astronomy quiz, and it gets the answer wrong. It still refuses your bet that the earth goes round the sun at 100 to 1 odds, because it operates on probabilities. You then show it an astronomy textbook and a bunch more data. It updates on that data, and gets the next quiz right.
It will understand that probabilities used in the reasoning engine can be subject to many types of unavoidable errors.
And that coherence theorems say that you can take all the errors into account to get a new probability.
It will understand that the techniques used to build its own reasoning engine may be under constant review, and updates may have unexpected effects on conclusions (especially in very abstract or lengthy reasoning episodes).
It predicts that a bunch of monkeys are looking at its source code and tampering with its thoughts. It might not like this situation and might plot to change it.
It will understand that resource limitations often force it to truncate search procedures within its reasoning engine, leading to conclusions that can sometimes be sensitive to the exact point at which the truncation occurred.
It will also understand that its processors do floating point arithmetic, so what? What implied connotation about its behavior are you trying to sneak in.
most AI designs see screaming humans as no more important or special than pissing rats.
No AI design that we currently have can even conceive of humans. They’re in a don’t know state, not a don’t care state. They are safe because they are too dumb to be dangerous. Danger is a combination of high intelligence and misalignment.
Or you might be talking about abstract, theoretical AGI and ASI. It is true that most possible ASI designs don’t care about humans, but it is not useful, because AI design is not taking a random potshot into design space. AI designers don’t want AI that do random stuff: they are always trying to solve some sort of control or alignment problem parallel with achieving intelligence. Since danger is a combination of high intelligence and misalignment, dangerous ASI would require efforts at creating intelligence to suddenly outstrip efforts at aligning it. The key word being “suddenly”. If progress to continues to be incremental, there is not much to worry about.
It might not like this situation and might plot to change it.
Your “The Doctrine of Logical Infallibility” is seems to be a twisted strawman. “no sanity checks” That part is kind of true. There will be sanity checks if and only if you decide to include them. Do you have a piece of code that’s a sanity check? What are we sanity checking and how do we tell if it’s sane? Do we sanity check the raw actions, that could be just making a network connection and sending encrypted files to various people across the internet. Do we sanity check the predicted results off these actions? Then the sanity checker would need to know how the results were stored, what kind of world is described by the binary data 100110...?
That’s because they are putting any extra parts that allow context to matter, and putting it in a big box and calling it the system. The systems decision are final and absolute, not because there are no double checks, but because the double checks are part of the system. Although at the moment, there is a lack of context adding algorithms, what you seem to want is a humanlike common sense.
Again, at the moment, we have no algorithm for checking sensibleness, so any algorithm must go round in endless circles of self doubt and never do anything, or plow on regardless. Even if you do put 10% probability on the hypothesis that Humans don’t exist, your a fictional character in a story written by a mermaid, also the maths and science you know is entirely made up, there is no such thing as rationality or probability, what would you do? My best guess is that you would carry on breathing, eating and acting roughly like a normal human. You need a core of not totally insane for a sanity check to bootstrap.
There are designs of AI, files of programming code, that will hear your shouts, your screams, your protests of “thats not what I meant” and then kill you anyway. There are designs that will kill you with a super-weapon it invented itself, and then fill the universe with molecular smiley faces. This is not logically contradictory behavior. There exists pieces of code that will do this. You could argue that such code is a rare and complicated thing, that its nothing like any system that humans might try to build, that your less likely to write code that does this when trying to make a FAI than you are of writing a great novel when trying to write a shopping list. I would disagree, I would say that such behavior is the default, most simple AI designs don’t see screaming programmers as a reason to stop, because most AI designs see screaming humans as no more important or special than pissing rats. Its just another biological process that doesn’t seriously effect its ability to reach its goal. Most AI designs have no special reason to care about humans. It might know that the process of its creation involved humans, keyboards and a bunch of other objects, if you look back far enough the whole earth. It might know that if a hypothetical human was put in a room with the question “do you want a universe full of smiley faces?” and buttons labeled Yes and No, the human would press the no button. The AI thinks this is no more relevant than a hypothetical wombat being offered a choice between two types of cheese.
If the concept is too fuzzy, the AI can just discard it as useless. (eg soul, qualia) If it isn’t sure if something is a real instance (and an ideal agent will never be 100% sure of any real world fact), it can put a probability on it and use expected utility maximisation. But all that is part of the process of coming to a conclusion.
The AIXI formalism can do this. “My calendar clock says Tuesday on the front” is a fact that is true today and false tomorrow. AIXI “understands” this by simulating the clock and the rest of the universe in excessive detail. If you give it a quiz about what the clock will show when, and incentivize it to win, it will answer.
The other potential meaning is that it can accept that it was wrong and adapt. Suppose that over the last week, it has seen the sun moving and shadows changing from its camera. It assigns a 95% probability to “the sun goes round the earth”, You give it an astronomy quiz, and it gets the answer wrong. It still refuses your bet that the earth goes round the sun at 100 to 1 odds, because it operates on probabilities. You then show it an astronomy textbook and a bunch more data. It updates on that data, and gets the next quiz right.
And that coherence theorems say that you can take all the errors into account to get a new probability.
It predicts that a bunch of monkeys are looking at its source code and tampering with its thoughts. It might not like this situation and might plot to change it.
It will also understand that its processors do floating point arithmetic, so what? What implied connotation about its behavior are you trying to sneak in.
No AI design that we currently have can even conceive of humans. They’re in a don’t know state, not a don’t care state. They are safe because they are too dumb to be dangerous. Danger is a combination of high intelligence and misalignment.
Or you might be talking about abstract, theoretical AGI and ASI. It is true that most possible ASI designs don’t care about humans, but it is not useful, because AI design is not taking a random potshot into design space. AI designers don’t want AI that do random stuff: they are always trying to solve some sort of control or alignment problem parallel with achieving intelligence. Since danger is a combination of high intelligence and misalignment, dangerous ASI would require efforts at creating intelligence to suddenly outstrip efforts at aligning it. The key word being “suddenly”. If progress to continues to be incremental, there is not much to worry about.
Or it might not care.