For example, if our function measures the probability that some particular glass is filled with water, the space near the maximum is full of worlds like “take over the galaxy and find the location least likely to be affected by astronomical phenomena, then build a megastructure around the glass designed to keep it full of water”.
If the function is ‘fill it and see it is filled forever’ then strange things may be required to accomplish that (to us) strange goal.
Idea:
Don’t specify our goals to AI using functions.
Flaw:
Current deep learning methods use functions to measure error, and AI learns by minimizing that error in an environment of training data. This has replaced the old paradigm of symbolic AI, which didn’t work very well. If progress continues in this direction, the first powerful AI will operate on the principles of deep learning.
Even if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does, assuming present trends hold. Building weaker, safer AI doesn’t stop others from building stronger, less safe AI.
Do you have any idea how to do “Don’t specify our goals to AI using functions.”? How are you judging “if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does”?
Idea:
Get multiple AIs to prevent each other from maximizing their goal functions.
Flaw:
The global maximum of any set of functions like this still doesn’t include human civilization. Either a single AI will win, or some subset will compete among themselves with just as little regard for preserving humanity as the single AI would have.
Maybe this list should be numbered.
This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)
Idea:
Don’t build powerful AI.
Flaw:
For all the ‘a plan that handles filling a glass of water, generated using time t’ ‘is flawed’ - this could actually work. Now, one might object that a particular entity will try to create powerful AI. While there might be incentives to do so, trying to set limits, or see safeguard deployed (if the AI managing air conditioning isn’t part of your AGI research, add these safeguards now).
This isn’t meant as a pure ‘this will solve the problem’ approach, but that doesn’t mean it might not work (thus ensuring AIs handling cooling/whatever at data centers meet certain criteria).
Once it exists, powerful AI is likely to be much easier to generate or copy than historical examples of dangerous technologies like nuclear weapons.
There’s a number of assumptions here which may be correct, but are worth pointing out.
How big a file do you think an AI is?
1 MB?
1 TB?
That’s not to say that compression exists, but also, what hardware can run this program/software you are imagining (and how fast)?
undesirable worlds near the global maximum.
There’s a lot of stuff in here about maximums. It seems like your beliefs that ‘functions won’t do’ stems from a belief that maximization is occurring. Maximizing a function isn’t always easy, even at the level of ‘find the maximum of this function mathematically’. That’s not to say that what you’re saying is necessarily wrong, but suppose some goal is ‘find out how this protein folds’. It might be a solvable problem, but that doesn’t mean it is an easy problem. It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.
Thanks for the responses, I’ll try to address them individually.
If the function is ‘fill it and see it is filled forever’ then strange things may be required to accomplish that (to us) strange goal.
I agree that this doesn’t adequately represent our goal, but I think the problem persists even when we add lots of qualifications like “make sure the glass is filled with water for the next five minutes and then lose interest”. The maximum of that function might not include a large-scale plan due to limited time, but it could include destroying everything within range except for the facility to prevent interference. It’s possible that adding enough qualifications would solve this, but it wouldn’t be easy to verify.
Do you have any idea how to do “Don’t specify our goals to AI using functions.”? How are you judging “if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does”?
I don’t know how to achieve the same capabilities as current or future machine learning without specifying goals using functions. In that sense, I think it would be hard to match something like GPT without deep learning, and so more legible alternatives wouldn’t be competitive. (I might be understating this. It seems like function-based learning is the only method we have that works.)
This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)
I was thinking of Robin Hanson’s idea that the competitive market of many AIs would prevent any individual AI from taking over. I don’t think that would work either, but I agree that intentionally designing opposing AIs would be even worse.
For all the ‘a plan that handles filling a glass of water, generated using time t’ ‘is flawed’ - this could actually work.
It seems like humans are often kept safe from each other by limited resources and limited thinking time, so I agree that this could be a promising approach. But we would have to prevent a limited AI from increasing its own capabilities.
How big a file do you think an AI is?
Maybe it’s not as easy as copying a piece of software, but probably easier than building a nuclear weapon in terms of resources. If running it requires an uncommon amoung of computing, then you’re right, it would be hard to copy.
Maximizing a function isn’t always easy, even at the level of ‘find the maximum of this function mathematically’.
You’re right, achieving the global maximum for many functions would be unfeasible. The risk comes when the space of high-value bad outcomes overlaps with the space of feasibale strategies for the AI. This is not necessarily at or even near the global maximum. This way of framing the problem might be more accurate.
It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.
This is true unless the AI is trying to maximize the probability of success, or the proximity to some exact amoung of fullness, or some other precise goal. If it works by satisfying goals without maximizing anything, then the problem might be solved. But I don’t think we know how to build powerful AI that satisfy goals without maximizing anything.
If the function is ‘fill it and see it is filled forever’ then strange things may be required to accomplish that (to us) strange goal.
Do you have any idea how to do “Don’t specify our goals to AI using functions.”? How are you judging “if we build AI that doesn’t maximize a function, it won’t be competitive with AI that does”?
Maybe this list should be numbered.
This one is worse than it looks (though it seems underspecified). Goal 1: some notion of human flourishing. Goal 2: prevent goal 1 from being maximized. (If this is the opposite of 1, you may have just asked to be nuked.)
For all the ‘a plan that handles filling a glass of water, generated using time t’ ‘is flawed’ - this could actually work. Now, one might object that a particular entity will try to create powerful AI. While there might be incentives to do so, trying to set limits, or see safeguard deployed (if the AI managing air conditioning isn’t part of your AGI research, add these safeguards now).
This isn’t meant as a pure ‘this will solve the problem’ approach, but that doesn’t mean it might not work (thus ensuring AIs handling cooling/whatever at data centers meet certain criteria).
There’s a number of assumptions here which may be correct, but are worth pointing out.
How big a file do you think an AI is?
1 MB?
1 TB?
That’s not to say that compression exists, but also, what hardware can run this program/software you are imagining (and how fast)?
There’s a lot of stuff in here about maximums. It seems like your beliefs that ‘functions won’t do’ stems from a belief that maximization is occurring. Maximizing a function isn’t always easy, even at the level of ‘find the maximum of this function mathematically’. That’s not to say that what you’re saying is necessarily wrong, but suppose some goal is ‘find out how this protein folds’. It might be a solvable problem, but that doesn’t mean it is an easy problem. It also seems like, if the goal is to fill a glass with water, then the goal is achieved when the glass is filled with water.
Thanks for the responses, I’ll try to address them individually.
I agree that this doesn’t adequately represent our goal, but I think the problem persists even when we add lots of qualifications like “make sure the glass is filled with water for the next five minutes and then lose interest”. The maximum of that function might not include a large-scale plan due to limited time, but it could include destroying everything within range except for the facility to prevent interference. It’s possible that adding enough qualifications would solve this, but it wouldn’t be easy to verify.
I don’t know how to achieve the same capabilities as current or future machine learning without specifying goals using functions. In that sense, I think it would be hard to match something like GPT without deep learning, and so more legible alternatives wouldn’t be competitive. (I might be understating this. It seems like function-based learning is the only method we have that works.)
I was thinking of Robin Hanson’s idea that the competitive market of many AIs would prevent any individual AI from taking over. I don’t think that would work either, but I agree that intentionally designing opposing AIs would be even worse.
It seems like humans are often kept safe from each other by limited resources and limited thinking time, so I agree that this could be a promising approach. But we would have to prevent a limited AI from increasing its own capabilities.
Maybe it’s not as easy as copying a piece of software, but probably easier than building a nuclear weapon in terms of resources. If running it requires an uncommon amoung of computing, then you’re right, it would be hard to copy.
You’re right, achieving the global maximum for many functions would be unfeasible. The risk comes when the space of high-value bad outcomes overlaps with the space of feasibale strategies for the AI. This is not necessarily at or even near the global maximum. This way of framing the problem might be more accurate.
This is true unless the AI is trying to maximize the probability of success, or the proximity to some exact amoung of fullness, or some other precise goal. If it works by satisfying goals without maximizing anything, then the problem might be solved. But I don’t think we know how to build powerful AI that satisfy goals without maximizing anything.