A lot of the difficulty comes from the fact that AI safety is a problem we don’t understand; the field is pre-paradigmatic. We don’t know how best to frame the problem. We don’t know what questions to ask, what approximations to make, how to break the problem into good subproblems, what to pay attention to or what to ignore. All of these issues are themselves major open problems.
That makes a lot of the usual scaling-up approaches hard.
We can’t write a textbook, because we don’t know what needs to go in it. The one thing we know for sure is that the things we might currently think to write down are not sufficient; we do not yet have all the pieces.
We can’t scale up existing training programs, because we don’t quite know what skills/knowledge are crucial for AI safety research. We do know that no current program trains quite the right mix of skills/knowledge; otherwise AI safety would already fit neatly into that paradigm.
Existing organizations have limited ability to absorb more people, because they don’t understand the problem well enough to effectively break it into pieces which can be pursued in parallel. Figuring that out is part of what existing orgs are trying to do.
Previous bullet also applies to people who have some legible achievements and could found a new org.
Finally, nobody currently knows how to formulate the core problems of the field in terms of highly legible objectives. Again, that’s a major open problem.
I learned about the abundance of available resources this past spring. My own approach to leveraging more resources is to try to scale up the meta-level skills of specializing in problems we don’t understand. That’s largely what the framing practicum material is for—this is what a “textbook” looks like for fields where we don’t yet know what the textbook should contain, because figuring out the right framing tools is itself part of the problem.
I think if you’re in the early stages of a big project, like founding a pre-paradigmatic field, it often makes sense to be very breadth-first. You can save a lot of time trying to understand the broad contours of solution space before you get too deeply invested in a particular approach.
I think this can even be seen at the microscale (e.g. I was coaching someone on how to solve leetcode problems the other day, and he said my most valuable tip was to brainstorm several different approaches before exploring any one approach in depth). But it really shines at the macroscale (“you built entirely the wrong product because you didn’t spend enough time talking to customers and exploring the space of potential offerings in a breadth-first way”).
One caveat is that breadth-first works best if you have a good heuristic. For example, if someone with less than a year of programming experience was practicing leetcode problems, I wouldn’t emphasize the importance of brainstorming multiple approaches as much, because I wouldn’t expect them to have a well-developed intuition for which approaches will work best. For someone like that, I might recommend going depth-first almost at random until their intuition is developed (random rollouts in the context of monte carlo tree search are a related notion). I think there is actually some psych research showing that more experienced engineers will spend more time going breadth-first at the beginning of a project.
A synthesis of the above is: if AI safety is pre-paradigmatic, we want lots of people exploring a lot of different directions. That lets us understand the broad contours better, and also collects data to help refine our intuitions.
IMO the AI safety community has historically not been great at going breadth-first, e.g. investing a lot of effort in the early days into decision theory stuff which has lately become less fashionable. I also think people are overconfident in their intuitions about what will work, relative to the amount of time which has been spent going depth-first and trying to work out details related to “random” proposals.
In terms of turning money into AI safety, this strategy is “embarrassingly parallel” in the sense that it doesn’t require anyone to wait for a standard textbook or training program, or get supervision from some critical person. In fact, having a standard curriculum or a standard supervisor could be counterproductive, since it gets people anchored on a particular frame, which means a less broad area gets explored. If there has to be central coordination, it seems better to make a giant list of literatures which could provide insight, then assign each literature to a particular researcher to acquire expertise in.
After doing parallel exploration, we could do a reduction tree. Imagine if we ran an AI safety tournament where you could sign up as “red team”, “blue team”, or “judge”. At each stage, we generate tuples of (red player, blue player, judge) at random and put them in a video call or a Google Doc. The blue player tries to make a proposal, the red player tries to break it, the judge tries to figure out who won. Select the strongest players on each team at each stage and have them advance to the next stage, until you’re left with the very best proposals and the very most difficult to solve issues. Then focus attention on breaking those proposals / solving those issues.
Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don’t have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.
I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
The problem comes when one tries to pour a lot of money into that sort of approach
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
Another implication John didn’t list, is that a certain kind of illegible talent, the kind that can make progress in pre-paradigmatic fields, is crucial. This seems to strongly conflict with the statement in your post:
>Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren’t the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it’s trying to select for.
Do you think that special sort of talent doesn’t exist? Or is abundant? Or isn’t the right way to understand the situation? Or what?
A lot of the difficulty comes from the fact that AI safety is a problem we don’t understand; the field is pre-paradigmatic. We don’t know how best to frame the problem. We don’t know what questions to ask, what approximations to make, how to break the problem into good subproblems, what to pay attention to or what to ignore. All of these issues are themselves major open problems.
That makes a lot of the usual scaling-up approaches hard.
We can’t write a textbook, because we don’t know what needs to go in it. The one thing we know for sure is that the things we might currently think to write down are not sufficient; we do not yet have all the pieces.
We can’t scale up existing training programs, because we don’t quite know what skills/knowledge are crucial for AI safety research. We do know that no current program trains quite the right mix of skills/knowledge; otherwise AI safety would already fit neatly into that paradigm.
Existing organizations have limited ability to absorb more people, because they don’t understand the problem well enough to effectively break it into pieces which can be pursued in parallel. Figuring that out is part of what existing orgs are trying to do.
Previous bullet also applies to people who have some legible achievements and could found a new org.
Finally, nobody currently knows how to formulate the core problems of the field in terms of highly legible objectives. Again, that’s a major open problem.
I learned about the abundance of available resources this past spring. My own approach to leveraging more resources is to try to scale up the meta-level skills of specializing in problems we don’t understand. That’s largely what the framing practicum material is for—this is what a “textbook” looks like for fields where we don’t yet know what the textbook should contain, because figuring out the right framing tools is itself part of the problem.
I think if you’re in the early stages of a big project, like founding a pre-paradigmatic field, it often makes sense to be very breadth-first. You can save a lot of time trying to understand the broad contours of solution space before you get too deeply invested in a particular approach.
I think this can even be seen at the microscale (e.g. I was coaching someone on how to solve leetcode problems the other day, and he said my most valuable tip was to brainstorm several different approaches before exploring any one approach in depth). But it really shines at the macroscale (“you built entirely the wrong product because you didn’t spend enough time talking to customers and exploring the space of potential offerings in a breadth-first way”).
One caveat is that breadth-first works best if you have a good heuristic. For example, if someone with less than a year of programming experience was practicing leetcode problems, I wouldn’t emphasize the importance of brainstorming multiple approaches as much, because I wouldn’t expect them to have a well-developed intuition for which approaches will work best. For someone like that, I might recommend going depth-first almost at random until their intuition is developed (random rollouts in the context of monte carlo tree search are a related notion). I think there is actually some psych research showing that more experienced engineers will spend more time going breadth-first at the beginning of a project.
A synthesis of the above is: if AI safety is pre-paradigmatic, we want lots of people exploring a lot of different directions. That lets us understand the broad contours better, and also collects data to help refine our intuitions.
IMO the AI safety community has historically not been great at going breadth-first, e.g. investing a lot of effort in the early days into decision theory stuff which has lately become less fashionable. I also think people are overconfident in their intuitions about what will work, relative to the amount of time which has been spent going depth-first and trying to work out details related to “random” proposals.
In terms of turning money into AI safety, this strategy is “embarrassingly parallel” in the sense that it doesn’t require anyone to wait for a standard textbook or training program, or get supervision from some critical person. In fact, having a standard curriculum or a standard supervisor could be counterproductive, since it gets people anchored on a particular frame, which means a less broad area gets explored. If there has to be central coordination, it seems better to make a giant list of literatures which could provide insight, then assign each literature to a particular researcher to acquire expertise in.
After doing parallel exploration, we could do a reduction tree. Imagine if we ran an AI safety tournament where you could sign up as “red team”, “blue team”, or “judge”. At each stage, we generate tuples of (red player, blue player, judge) at random and put them in a video call or a Google Doc. The blue player tries to make a proposal, the red player tries to break it, the judge tries to figure out who won. Select the strongest players on each team at each stage and have them advance to the next stage, until you’re left with the very best proposals and the very most difficult to solve issues. Then focus attention on breaking those proposals / solving those issues.
I’m curious what this is referring to.
There’s apparently a lot of funding looking for useful ways to reduce AI X-risk right now.
Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don’t have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.
I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
My Gordon Worley impression: If we don’t have a fraud problem, we’re not throwing around enough money :P
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
lol this does sound exactly like something I would say!
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
Another implication John didn’t list, is that a certain kind of illegible talent, the kind that can make progress in pre-paradigmatic fields, is crucial. This seems to strongly conflict with the statement in your post:
>Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren’t the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it’s trying to select for.
Do you think that special sort of talent doesn’t exist? Or is abundant? Or isn’t the right way to understand the situation? Or what?