I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
The problem comes when one tries to pour a lot of money into that sort of approach
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing
I agree with the bit in the post about how it makes sense to invest in a lot of different approaches by different small teams. Similarly with hiring people to work on various smaller/specific questions. This makes sense at small scale, and there’s probably still room to scale it up more at current margins. The problem comes when one tries to pour a lot of money into that sort of approach: spending a lot of money on something is applying optimization pressure, whether we intend to or not, and if we don’t know what we’re optimizing for then the default thing which happens is that we Goodhart on people trying to look good to whoever’s making the funding decisions.
So, yes at small scale and probably at current margins, but this is a strategy which can only scale so far before breaking down.
My Gordon Worley impression: If we don’t have a fraud problem, we’re not throwing around enough money :P
Fraud also seems like the kind of problem you can address as it comes up. And I suspect just requiring people to take a salary cut is a fairly effective way to filter for idealism.
All you have to do to distract fraudsters is put a list of poorly run software companies where you can get paid more money to work less hard at the top of the application ;-) How many fraudsters would be silly enough to bother with a fraud opportunity that wasn’t on the Pareto frontier?
lol this does sound exactly like something I would say!
It seems to me that the Goodhart effect is actually stronger if you’re granting less money.
Suppose that we have a population of people who are keen to work on AI safety. Suppose every time a person from that population gets an application for funding rejected, they lose a bit of the idealism which initially drew them to the area and they start having a few more cynical thoughts like “my guess is that grantmakers want to fund X, maybe I should try to be more like X even though I don’t personally think X is a great idea.”
In that case, the level of Goodharting seems to be pretty much directly proportional to the number of rejections—and the less funding available, the greater the quantity of rejections.
On the other hand, if the United Nations got together tomorrow and decided to fund a worldwide UBI, there’d be no optimization pressure at all, and people would just do whatever seemed best to them personally.
EDIT: This appears to be a concrete example of what I’m describing