There are a huge number of possible designs of AI, most of them are not well understood. So researchers look at agents like AIXI, a formal specification of an agent that would in some sense behave intelligently, given infinite compute. It does display the taking over the world failure. Suppose you give the AI a utility function of maximising the number of dopamine molecules within 1μm of a strand of human DNA (Defined as a strand of DNA, agreeing with THIS 4GB file in at least 99.9% of locations) This is a utility function that could easily be specified in terms of atoms. You could write a function that takes in a description of the universe in terms of the coordinates of each atom, or a discrete approximation to the quantum wave function or whatever, and returns a number representing utility. It would be fairly straightforward to design an agent that, given infinite compute, would act to maximise this function. It seems somewhat harder, but not necessarily impossible, to make a system that can approximate the same behavior given a reasonable amount of compute. Nowhere in this potential AI design is anything as nebulous, anything as hard to specify in terms of atom positions as human preferences or consent. The system does understand humans in a sense, it can simulate them atom by atom and predict exactly how they will panic and try to stop it, but there is no object in its memory that corresponds to human consent, or preferences or well being, or humans at all. There is no checker code. This particular design of AI would make vats full of human DNA and dopamine.
Now this design was simplistic, and a smart AI designer should know not to do that, but the process of warning potential AI designers not to do that involves a lot of shouting about what would happen if you did do that. We also don’t know how far this sort of behavior reaches, we don’t understand the less simplistic designs enough to say what they would do. This makes them not known deadly, which is different from known not deadly.
“Canonical Logical AI” is an umbrella term designed to capture a class of AI architectures that are widely assumed in the AI community to be the only meaningful class of AI worth discussing.
A lot of this is a looking where the light is effect. CLAI type designs are often the designs that we can reason best about. If we intend to build an AI that is known good, we better pick it from a class of AI’s that we understand well enough to know things about them, rather than taking a shot in the dark.
There are cases when we know the right way of doing things. We know that probability is the right way of handling uncertain beliefs, and any agent will succeed to the extent that what it is doing approximates probability theory, and fail to the extent that it doesn’t. There are all sorts of approximations and ways to obfusticate the probabilities, but agents that reason using explicit probabilities seem a good place to start.
Much of your discussion of “Logical vs. Swarm AI” sounds like “Logical vs Connectionist AI”. The same criticisms apply, at best its two possible options out of a vast swarm of possible options. At worst, the logical AI is a huge pile of suggestively named lisp tokens, and the swarm AI is a bag of ad hoc heuristics manually created by the programmer. The resemblance between modern neural nets and the human (or earthworm) brain is about as close as the resemblance between airplanes and birds. Neural nets have their own reasons for working, and they can be mathematically analyzed. They also suffer from mesa optimization, which would make it hard for a powerful neural net based system to be safe.
There are a huge number of possible designs of AI, most of them are not well understood. So researchers look at agents like AIXI, a formal specification of an agent that would in some sense behave intelligently, given infinite compute. It does display the taking over the world failure. Suppose you give the AI a utility function of maximising the number of dopamine molecules within 1μm of a strand of human DNA (Defined as a strand of DNA, agreeing with THIS 4GB file in at least 99.9% of locations) This is a utility function that could easily be specified in terms of atoms. You could write a function that takes in a description of the universe in terms of the coordinates of each atom, or a discrete approximation to the quantum wave function or whatever, and returns a number representing utility. It would be fairly straightforward to design an agent that, given infinite compute, would act to maximise this function. It seems somewhat harder, but not necessarily impossible, to make a system that can approximate the same behavior given a reasonable amount of compute. Nowhere in this potential AI design is anything as nebulous, anything as hard to specify in terms of atom positions as human preferences or consent. The system does understand humans in a sense, it can simulate them atom by atom and predict exactly how they will panic and try to stop it, but there is no object in its memory that corresponds to human consent, or preferences or well being, or humans at all. There is no checker code. This particular design of AI would make vats full of human DNA and dopamine.
Now this design was simplistic, and a smart AI designer should know not to do that, but the process of warning potential AI designers not to do that involves a lot of shouting about what would happen if you did do that. We also don’t know how far this sort of behavior reaches, we don’t understand the less simplistic designs enough to say what they would do. This makes them not known deadly, which is different from known not deadly.
A lot of this is a looking where the light is effect. CLAI type designs are often the designs that we can reason best about. If we intend to build an AI that is known good, we better pick it from a class of AI’s that we understand well enough to know things about them, rather than taking a shot in the dark.
There are cases when we know the right way of doing things. We know that probability is the right way of handling uncertain beliefs, and any agent will succeed to the extent that what it is doing approximates probability theory, and fail to the extent that it doesn’t. There are all sorts of approximations and ways to obfusticate the probabilities, but agents that reason using explicit probabilities seem a good place to start.
Much of your discussion of “Logical vs. Swarm AI” sounds like “Logical vs Connectionist AI”. The same criticisms apply, at best its two possible options out of a vast swarm of possible options. At worst, the logical AI is a huge pile of suggestively named lisp tokens, and the swarm AI is a bag of ad hoc heuristics manually created by the programmer. The resemblance between modern neural nets and the human (or earthworm) brain is about as close as the resemblance between airplanes and birds. Neural nets have their own reasons for working, and they can be mathematically analyzed. They also suffer from mesa optimization, which would make it hard for a powerful neural net based system to be safe.