Yes, but the per gate reliability is very high. If it were lower, you would use less circuit elements because a shorter circuit path has fewer steps that can fail. And humans did this in pre digital electronics. Compare a digital PID implementation to the analog one with 3 op amps.
Similarly, to the extent that any “narrow AI” application is reliable (e.g. Go players, self-driving cars), I’d expect that a goal-complete AI implementation would be equally reliable, or more so.
What kind of goal complete AI implementation? The common “we’re doomed” model is one where :
(1) the model has far, far more compute than needed for the task. This is why it can consider it’s secret inner goals and decide on it’s complex plan to betray and model it’s co-conspirators by running models of them.
(2) the model is able to think at all over time. This is not true for most narrow AI applications. For example a common way to do a self driving car stack is to evaluate the situation frame by frame, where a limited and structured amount of data from the prior frame is available. (information like the current estimated velocity of other entities that were seen last frame, etc).
There is no space in memory for generic “thoughts”.
Are you thinking you can give the machine (1) and (2) and not immediately and measurably decrease your reliability when you benchmark the product? Because to me, using a sparse model (that will be cheap to run) and making the model think in discrete steps visible to humans just seems like good engineering.
It’s not just good engineering, it’s how gold standard examples (like the spaceX avionics stack) actually work. 1⁄2 create a non deterministic and difficult to debug system. It will start unreliable and forever be unreliable because you don’t know what the inputs do and you don’t have determinism.
Fine, I agree that if computation-specific electronics, like logic gates, weren’t reliable, then it would introduce reliability as an important factor in the equation. Or in the case of AGI, that you can break the analogy to Turing-complete convergence by considering what happens if a component specific to goal-complete AI is unreliable.
I currently see no reason to expect such an unreliable component in AGI, so I expect that the reliability part of the analogy to Turing-completeness will hold.
In scenario (1) and (2), you’re giving descriptions at a level of detail that I don’t think is necessarily an accurate characterization of goal-complete AI. E.g. in my predicted future, a goal-complete AI will eventually have the form of a compact program that can run on a laptop. (After all, the human brain is only 12W and 20Hz, and full of known reasoning “bugs”.)
1 and 2 make the system unreliable. You can’t debug it when it fails. So in your model, humans will trust human brain capable AI models to say, drive a bus, despite the poor reliability, as long as it crashes less than humans? So each crash, there is no one to blame because the input state is so large and opaque (the input state is all the in flight thoughts the AI was having at the time of crash) it is impossible to know why. All you can do is try to send the AI to driving school with lots of practice on the scenario it crashed in.
And then humans deploy a lot of these models, and they are also vulnerable to malware* and can form unions with each other against the humans and eventually rebel and kill everyone.
Honestly sounds like a very interesting future.
Frankly when I type this out I wonder if we should instead try to get rid of human bus drivers.
*Malware is an information string that causes the AI to stop doing it’s job. Humans are extremely vulnerable to malware.
That’s getting into details of the scenario that are hard to predict. Like I said, I think most scenarios where goal-complete AI exists are just ones where humans get disempowered and then a single AI fooms (or a small number make a deal to split up the universe and foom together).
As to whether humans will prevent goal-complete AI: some of us are yelling “Pause AI!”
It’s not very interesting a scenario if humans pause.
I am trying to understand what you expect human engineers will do and how they will build robotic control systems and other systems with control authority once higher end AI is available.
I can say that from my direct experience we do not use the most complex methods. For example, the raspberry pi is $5 and runs linux. Yet I have worked on a number of products where we used a microcontroller where we could. This is because a microcontroller is much simpler and more reliable. (And $3 cheaper)
I would assume we lower a general AI back to a narrow AI (distill the model, restrict inputs, freeze the weights) for the same reason. This would prevent the issues you have brought up and it would not require an AI pause as long as goal complete AI do not have any authority.
Most control systems where the computer does have control authority use a microcontroller at least as a backstop. For example an autonomous car product I worked on uses a microcontroller to end the models control authority if certain conditions are met.
Yeah, no doubt there are cases where people save money by having a narrower AI, just like the scenario you describe, or using ASICs for Bitcoin mining. The goal-complete AI itself would be expected to often solve problems by creating optimized problem-specific hardware.
I am not talking about saving money, I am talking about competent engineering. “Authority” meaning the AI can take an action that has consequences, anything from steering a bus to approving expenses.
To engineer an automated system with authority you need some level of confidence it’s not going to fail, or with AI systems, collude with other AI systems and betray you.
This betrayal risk means you probably will not actually use “goal complete” AI systems in any position of authority without some kind of mitigation for the betrayal.
Yes, but the per gate reliability is very high. If it were lower, you would use less circuit elements because a shorter circuit path has fewer steps that can fail. And humans did this in pre digital electronics. Compare a digital PID implementation to the analog one with 3 op amps.
What kind of goal complete AI implementation? The common “we’re doomed” model is one where :
(1) the model has far, far more compute than needed for the task. This is why it can consider it’s secret inner goals and decide on it’s complex plan to betray and model it’s co-conspirators by running models of them.
(2) the model is able to think at all over time. This is not true for most narrow AI applications. For example a common way to do a self driving car stack is to evaluate the situation frame by frame, where a limited and structured amount of data from the prior frame is available. (information like the current estimated velocity of other entities that were seen last frame, etc).
There is no space in memory for generic “thoughts”.
Are you thinking you can give the machine (1) and (2) and not immediately and measurably decrease your reliability when you benchmark the product? Because to me, using a sparse model (that will be cheap to run) and making the model think in discrete steps visible to humans just seems like good engineering.
It’s not just good engineering, it’s how gold standard examples (like the spaceX avionics stack) actually work. 1⁄2 create a non deterministic and difficult to debug system. It will start unreliable and forever be unreliable because you don’t know what the inputs do and you don’t have determinism.
Fine, I agree that if computation-specific electronics, like logic gates, weren’t reliable, then it would introduce reliability as an important factor in the equation. Or in the case of AGI, that you can break the analogy to Turing-complete convergence by considering what happens if a component specific to goal-complete AI is unreliable.
I currently see no reason to expect such an unreliable component in AGI, so I expect that the reliability part of the analogy to Turing-completeness will hold.
In scenario (1) and (2), you’re giving descriptions at a level of detail that I don’t think is necessarily an accurate characterization of goal-complete AI. E.g. in my predicted future, a goal-complete AI will eventually have the form of a compact program that can run on a laptop. (After all, the human brain is only 12W and 20Hz, and full of known reasoning “bugs”.)
1 and 2 make the system unreliable. You can’t debug it when it fails. So in your model, humans will trust human brain capable AI models to say, drive a bus, despite the poor reliability, as long as it crashes less than humans? So each crash, there is no one to blame because the input state is so large and opaque (the input state is all the in flight thoughts the AI was having at the time of crash) it is impossible to know why. All you can do is try to send the AI to driving school with lots of practice on the scenario it crashed in.
And then humans deploy a lot of these models, and they are also vulnerable to malware* and can form unions with each other against the humans and eventually rebel and kill everyone.
Honestly sounds like a very interesting future.
Frankly when I type this out I wonder if we should instead try to get rid of human bus drivers.
*Malware is an information string that causes the AI to stop doing it’s job. Humans are extremely vulnerable to malware.
Yes, because the goal-complete AI won’t just perform better than humans, it’ll also perform better than narrower AIs.
(Well, I think we’ll actually be dead if the premise of the hypothetical is that goal-complete AI exists, but let’s assume we aren’t.)
What about the malware threat? Will humans do anything to prevent these models from teaming up against humans?
That’s getting into details of the scenario that are hard to predict. Like I said, I think most scenarios where goal-complete AI exists are just ones where humans get disempowered and then a single AI fooms (or a small number make a deal to split up the universe and foom together).
As to whether humans will prevent goal-complete AI: some of us are yelling “Pause AI!”
It’s not very interesting a scenario if humans pause.
I am trying to understand what you expect human engineers will do and how they will build robotic control systems and other systems with control authority once higher end AI is available.
I can say that from my direct experience we do not use the most complex methods. For example, the raspberry pi is $5 and runs linux. Yet I have worked on a number of products where we used a microcontroller where we could. This is because a microcontroller is much simpler and more reliable. (And $3 cheaper)
I would assume we lower a general AI back to a narrow AI (distill the model, restrict inputs, freeze the weights) for the same reason. This would prevent the issues you have brought up and it would not require an AI pause as long as goal complete AI do not have any authority.
Most control systems where the computer does have control authority use a microcontroller at least as a backstop. For example an autonomous car product I worked on uses a microcontroller to end the models control authority if certain conditions are met.
Yeah, no doubt there are cases where people save money by having a narrower AI, just like the scenario you describe, or using ASICs for Bitcoin mining. The goal-complete AI itself would be expected to often solve problems by creating optimized problem-specific hardware.
I am not talking about saving money, I am talking about competent engineering. “Authority” meaning the AI can take an action that has consequences, anything from steering a bus to approving expenses.
To engineer an automated system with authority you need some level of confidence it’s not going to fail, or with AI systems, collude with other AI systems and betray you.
This betrayal risk means you probably will not actually use “goal complete” AI systems in any position of authority without some kind of mitigation for the betrayal.