this is important because it opens up some positive sum trades: capabilities people really hate it when their release dates get set back by e.g safety reviews (which is unfortunate), but they mind a lot less if you’re taking 5% of compute (which is a huge amount of absolute compute, and you should value a lot!), and they mind even less if you have an entire team doing good work somewhere out of their way (which you should also value a lot!).
If the safety people never actually slow down or materially dent the model release schedule (which in your model is what triggers conflict), what you are describing is not a sequence of positive sum trades between the capabilities and safety teams. Rather, it’s a model of how companies can devote relatively small amounts of resources to throw up a safety-coated PR shield around their core capabilities research effort. IRL we’ve also seen safety teams get dissolved after making big claims to funding which could conceivably dent the release schedule (the 20% for superalignment being a good example)
why is it not positive sum if you never slow down the model release schedule? positive sum doesn’t mean “nobody compromises on anything.” the entire point of positive sum trades is you find things that are compromises for the other party but ultimately not very costly for them, but are very beneficial to you; and in exchange, you do something that you would rather not do, but is not that costly for you, and beneficial for the other party. and on net both people get something they would rather have than the thing they sacrificed. so the potential trade here is:
capabilities teams sacrifice x% of their compute and $y of salary money. they grumble a bit because on the margin another x% more compute could have make them very slightly faster, but at the end of the day it’s not make or break for them.
alignment teams hire lots of really good alignment researchers, and do a huge amount of good work. alignment teams allow the lab to get some good PR. maybe even let their alignment work have the side effect of making the models very slightly better in ways that don’t reduce serial time a lot (though i think this requires more care than the PR).
of course this is all assuming that you actually do good work with the resources, which is pretty hard. if you’re doing useless work then it doesn’t matter whether you’re spending philanthropic money or lab money or whatever, you should stop doing bad work and do good work instead.
the superalignment situation is very unfortunate and reflects poorly on openai, but i think the entire situation was also net negative from a perfectly spherical openai’s perspective (negative PR from breaking this commitment far outweighed any of the benefits from making it in the first place; a perfectly spherical rational openai should have either not made the commitment at all, made a smaller commitment in the beginning, or upheld the commitment. in reality, the shift in position is of course easily explained by the board situation.)
If the safety people never actually slow down or materially dent the model release schedule (which in your model is what triggers conflict), what you are describing is not a sequence of positive sum trades between the capabilities and safety teams. Rather, it’s a model of how companies can devote relatively small amounts of resources to throw up a safety-coated PR shield around their core capabilities research effort. IRL we’ve also seen safety teams get dissolved after making big claims to funding which could conceivably dent the release schedule (the 20% for superalignment being a good example)
why is it not positive sum if you never slow down the model release schedule? positive sum doesn’t mean “nobody compromises on anything.” the entire point of positive sum trades is you find things that are compromises for the other party but ultimately not very costly for them, but are very beneficial to you; and in exchange, you do something that you would rather not do, but is not that costly for you, and beneficial for the other party. and on net both people get something they would rather have than the thing they sacrificed. so the potential trade here is:
capabilities teams sacrifice x% of their compute and $y of salary money. they grumble a bit because on the margin another x% more compute could have make them very slightly faster, but at the end of the day it’s not make or break for them.
alignment teams hire lots of really good alignment researchers, and do a huge amount of good work. alignment teams allow the lab to get some good PR. maybe even let their alignment work have the side effect of making the models very slightly better in ways that don’t reduce serial time a lot (though i think this requires more care than the PR).
of course this is all assuming that you actually do good work with the resources, which is pretty hard. if you’re doing useless work then it doesn’t matter whether you’re spending philanthropic money or lab money or whatever, you should stop doing bad work and do good work instead.
the superalignment situation is very unfortunate and reflects poorly on openai, but i think the entire situation was also net negative from a perfectly spherical openai’s perspective (negative PR from breaking this commitment far outweighed any of the benefits from making it in the first place; a perfectly spherical rational openai should have either not made the commitment at all, made a smaller commitment in the beginning, or upheld the commitment. in reality, the shift in position is of course easily explained by the board situation.)