Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don’t have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.
I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
The problem comes when one tries to pour a lot of money into that sort of approach
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
Another implication John didn’t list, is that a certain kind of illegible talent, the kind that can make progress in pre-paradigmatic fields, is crucial. This seems to strongly conflict with the statement in your post:
>Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren’t the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it’s trying to select for.
Do you think that special sort of talent doesn’t exist? Or is abundant? Or isn’t the right way to understand the situation? Or what?
Yes, I agree, but I think people still have lots of ideas about local actions that will help us make progress. For example, I have empirical questions about GPT-2 / 3 that I don’t have the time to test right now. So I could supervise maybe one person worth of work that just consisted of telling them what to do (though this hypothetical intern should also come up with some of their own ideas). I could not lay out a cohesive vision for other people to follow long-term (at least not very well), but as per my paragraph on cohesive visions, I think it suffices for training to merely have spare ideas lying around, and it suffices for forming an org to merely be fruitful to talk to.
I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
My Gordon Worley impression: If we don’t have a fraud problem, we’re not throwing around enough money :P
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
lol this does sound exactly like something I would say!
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
Another implication John didn’t list, is that a certain kind of illegible talent, the kind that can make progress in pre-paradigmatic fields, is crucial. This seems to strongly conflict with the statement in your post:
>Of the bottlenecks I listed above, I am going to mostly ignore talent. IMO, talented people aren’t the bottleneck right now, and the other problems we have are more interesting. We need to be able to train people in the details of an area of cutting-edge research. We need a larger number of research groups that can employ those people to work on specific agendas. And perhaps trickiest, we need to do this within a network of reputation and vetting that makes it possible to selectively spend money on good research without warping or stifling the very research it’s trying to select for.
Do you think that special sort of talent doesn’t exist? Or is abundant? Or isn’t the right way to understand the situation? Or what?