I’ll send this in a direct message. It isn’t groundbreaking or anything, but it is a capabilities idea.
I think this is a good description of what has been happening so far, in image classification, language modeling, game playing, etc. Do you agree?
Hm, I guess there’s a spectrum of how messy things are, both in how wide a net is cast, how wide the solution space is for the optimized criterion, and how pressure there is toward the criterion you want and toward resource-bounded solutions. In the extreme case where you simulate evolution of artificial agents, you’re not even optimizing for what you want (you don’t care if an agent is good at replicating), there are a huge number of policies that accomplish this well, and in an extreme version of this, there isn’t much pressure to spawn resource-bounded solutions. In current systems, things are decently less messy.
The solution space is much smaller for supervised learning than for reinforcement learning/agent design, because it has to output something that matches the training distribution. I worry I’m butchering the term solution space when I make this distinction, so let me try to be more precise. What I mean by solution space here is the size of the set of things you see when you look at a solution. For an evolved policy, you see the policy, but you don’t have to look at the internals. In other terms, the policy affects the world, but the internals don’t. If you’re looking at an evolved sequence predictor or function approximator, the output affects the world, but again, the internals don’t. (I suppose that’s what “internals” means). So from the set of solutions to the problem, the size of the set of the ways those solutions affect the world is large for evolved agents (because the policies affect the world, and they have great diversity) and small for evolved sequence predictors (because only the predictions affect the world, which have to be close to the truth). When the solution space is smaller, the well-defined objective matters more than the chaos of the initial search, so things seem less “messy” to me. So actually there’s a reason why sequence prediction might be less messy than AGI (or safer at the same messiness, depending on your definition of messy).
In modern neural networks, there is strong regularization toward tighter resource bounds, mostly because they are only so wide/deep. Within that width/depth, there isn’t much further regularization toward resource-boundedness, but dropout sort of does this, and we could do better without too much difficulty.
I do agree with you of course that current state-of-the-art is somewhat messy, but not in a way that concerns me quite as much, especially for supervised learning/sequence prediction. There are some formal guarantees that reassure me—a local minimum in a neural net for sequence prediction with even a minimal penalty for resource-profligacy does strike me as a pure sequence predictor. And of course, SGD finds local minima.
This might not directly bear on your last comment, but I think I might be more optimistic than you that strong optimization + resource penalties (like a speed prior, as we’ve discussed elsewhere) will cull mesa-optimizers, as long as we only ever see the final product of the optimization.
On a completely different note,
Yeah, I think it’s one reason for my general pessimism regarding AI safety.
For general arguments against any AI Safety proposals, technical researches might as well condition on their falsehood. If your intuition is correct, AI policy people can work on prepping the world to wait for more resources to run safe algorithms, even when the resources are available to run dangerous ones. Of course, we should be doing this anyway.
The solution space is much smaller for supervised learning than for reinforcement learning/agent design
I feel like there are probably some good insights in this paragraph but I’m having trouble understanding them except in a vague way. “What I mean by solution space here is the size of the set of things you see when you look at a solution.” is the first thing that confuses me. If it seems worthwhile to you, maybe think about how to explain it more clearly and write a post on the topic to get your ideas into wider circulation?
So actually there’s a reason why sequence prediction might be less messy than AGI (or safer at the same messiness, depending on your definition of messy).
It does seem like supervised learning / sequence prediction is safer than reinforcement learning / AGI, (but perhaps less capable?) I’d like to better understand your thoughts on this.
There are some formal guarantees that reassure me—a local minimum in a neural net for sequence prediction with even a minimal penalty for resource-profligacy does strike me as a pure sequence predictor. And of course, SGD finds local minima.
I wonder if it’s safe to rely on the fact that some optimization technique only finds local minima. What if future advances allow people to do better than this? How will “safe” techniques that only find local minima compete?
This might not directly bear on your last comment, but I think I might be more optimistic than you that strong optimization + resource penalties (like a speed prior, as we’ve discussed elsewhere) will cull mesa-optimizers, as long as we only ever see the final product of the optimization.
This might be worth writing up into a full post as well, as I’d like to better understand your reasons for optimism.
For general arguments against any AI Safety proposals, technical researches might as well condition on their falsehood. If your intuition is correct, AI policy people can work on prepping the world to wait for more resources to run safe algorithms, even when the resources are available to run dangerous ones.
I see a number of reasons not to do this:
The general argument might not be fully general, so we should re-consider it against every new proposal to see if it still applies.
The general argument might have a general flaw, so we should re-consider it once a while to see if it can be invalidated.
If the argument is indeed false, it seems like we’ll never find out why it’s false if we just condition on its falsity, and knowing why might be really useful to guide future research.
If technical researchers rarely talk about the general argument among themselves, it will look to policy/strategy people like we’re not concerned about the argument, and as a result they will be more optimistic about AI safety than they should.
If the argument is actually fully general and water-tight, then it implies that resources going into technical AI safety might be better spent elsewhere, and even individual technical researchers might want to reconsider how to spend their time.
Of course, we should be doing this anyway.
Waiting for more resources to run safe algorithms, when the resources are available to run dangerous ones, can seem extremely costly (e.g., billions of people could die while we wait) and even preparing for this can be extremely costly (e.g., it requires setting up mechanisms for global coordination which can be dangerous in themselves and will use up huge amounts of political and diplomatic capital), and even talking about “global coordination to stop dangerous AI” or “letting billions of people die while we wait for safe algorithms” can be a political non-starter unless people are convinced that AI safety is really really hard.
To me, one of the main purposes of technical AI safety research is to generate convincing evidence that AI safety is really really hard (if indeed that is the case). Hopefully my position makes sense at this point?
Those all seem reasonable. 3 was one I considered, and this is maybe a bit pedantic, but if you’re conditioning on something being false, it’s still worthwhile to figure out how it’s false and use that information for other purposes. The key relevance of conditioning on its being false is what you do in other areas while that analysis is pending. Regarding some other points, I didn’t mean to shut down discussion on this issue, only highlight its possible independence from this idea.
I’ll do some more thinking about couple posts you’re requesting. Thanks for your interest. At the very least, if the first one doesn’t become its own post, I’ll respond more fully here.
I’ll send this in a direct message. It isn’t groundbreaking or anything, but it is a capabilities idea.
Hm, I guess there’s a spectrum of how messy things are, both in how wide a net is cast, how wide the solution space is for the optimized criterion, and how pressure there is toward the criterion you want and toward resource-bounded solutions. In the extreme case where you simulate evolution of artificial agents, you’re not even optimizing for what you want (you don’t care if an agent is good at replicating), there are a huge number of policies that accomplish this well, and in an extreme version of this, there isn’t much pressure to spawn resource-bounded solutions. In current systems, things are decently less messy.
The solution space is much smaller for supervised learning than for reinforcement learning/agent design, because it has to output something that matches the training distribution. I worry I’m butchering the term solution space when I make this distinction, so let me try to be more precise. What I mean by solution space here is the size of the set of things you see when you look at a solution. For an evolved policy, you see the policy, but you don’t have to look at the internals. In other terms, the policy affects the world, but the internals don’t. If you’re looking at an evolved sequence predictor or function approximator, the output affects the world, but again, the internals don’t. (I suppose that’s what “internals” means). So from the set of solutions to the problem, the size of the set of the ways those solutions affect the world is large for evolved agents (because the policies affect the world, and they have great diversity) and small for evolved sequence predictors (because only the predictions affect the world, which have to be close to the truth). When the solution space is smaller, the well-defined objective matters more than the chaos of the initial search, so things seem less “messy” to me. So actually there’s a reason why sequence prediction might be less messy than AGI (or safer at the same messiness, depending on your definition of messy).
In modern neural networks, there is strong regularization toward tighter resource bounds, mostly because they are only so wide/deep. Within that width/depth, there isn’t much further regularization toward resource-boundedness, but dropout sort of does this, and we could do better without too much difficulty.
I do agree with you of course that current state-of-the-art is somewhat messy, but not in a way that concerns me quite as much, especially for supervised learning/sequence prediction. There are some formal guarantees that reassure me—a local minimum in a neural net for sequence prediction with even a minimal penalty for resource-profligacy does strike me as a pure sequence predictor. And of course, SGD finds local minima.
This might not directly bear on your last comment, but I think I might be more optimistic than you that strong optimization + resource penalties (like a speed prior, as we’ve discussed elsewhere) will cull mesa-optimizers, as long as we only ever see the final product of the optimization.
On a completely different note,
For general arguments against any AI Safety proposals, technical researches might as well condition on their falsehood. If your intuition is correct, AI policy people can work on prepping the world to wait for more resources to run safe algorithms, even when the resources are available to run dangerous ones. Of course, we should be doing this anyway.
I feel like there are probably some good insights in this paragraph but I’m having trouble understanding them except in a vague way. “What I mean by solution space here is the size of the set of things you see when you look at a solution.” is the first thing that confuses me. If it seems worthwhile to you, maybe think about how to explain it more clearly and write a post on the topic to get your ideas into wider circulation?
It does seem like supervised learning / sequence prediction is safer than reinforcement learning / AGI, (but perhaps less capable?) I’d like to better understand your thoughts on this.
I wonder if it’s safe to rely on the fact that some optimization technique only finds local minima. What if future advances allow people to do better than this? How will “safe” techniques that only find local minima compete?
This might be worth writing up into a full post as well, as I’d like to better understand your reasons for optimism.
I see a number of reasons not to do this:
The general argument might not be fully general, so we should re-consider it against every new proposal to see if it still applies.
The general argument might have a general flaw, so we should re-consider it once a while to see if it can be invalidated.
If the argument is indeed false, it seems like we’ll never find out why it’s false if we just condition on its falsity, and knowing why might be really useful to guide future research.
If technical researchers rarely talk about the general argument among themselves, it will look to policy/strategy people like we’re not concerned about the argument, and as a result they will be more optimistic about AI safety than they should.
If the argument is actually fully general and water-tight, then it implies that resources going into technical AI safety might be better spent elsewhere, and even individual technical researchers might want to reconsider how to spend their time.
Waiting for more resources to run safe algorithms, when the resources are available to run dangerous ones, can seem extremely costly (e.g., billions of people could die while we wait) and even preparing for this can be extremely costly (e.g., it requires setting up mechanisms for global coordination which can be dangerous in themselves and will use up huge amounts of political and diplomatic capital), and even talking about “global coordination to stop dangerous AI” or “letting billions of people die while we wait for safe algorithms” can be a political non-starter unless people are convinced that AI safety is really really hard.
To me, one of the main purposes of technical AI safety research is to generate convincing evidence that AI safety is really really hard (if indeed that is the case). Hopefully my position makes sense at this point?
Those all seem reasonable. 3 was one I considered, and this is maybe a bit pedantic, but if you’re conditioning on something being false, it’s still worthwhile to figure out how it’s false and use that information for other purposes. The key relevance of conditioning on its being false is what you do in other areas while that analysis is pending. Regarding some other points, I didn’t mean to shut down discussion on this issue, only highlight its possible independence from this idea.
I’ll do some more thinking about couple posts you’re requesting. Thanks for your interest. At the very least, if the first one doesn’t become its own post, I’ll respond more fully here.