I realize I’m being a little pedantic here, but on the “joke” calculation: the problem here is that PR is a binary function depending on whether k utilitarians join or not, right? For instance, let −sPR be the effective safety premium from k safety-minded utilitarians joining (the value being negative as joining presumably accelerates the company), and suppose that each utilitarian joining leads to −sPR/k acceleration. Then a rational utilitarian would demand (ϵ+sPR)/k premium, which is not negligible.
Going back to the joke calculation, it implies that the bottleneck to preventing defection is coordination: k utilitarians acting together would not join for ϵ/k value as it is against their interests, but individually they have zero counterfactual impact, so they all join. In the real world, coordination is plausibly relevant, but it seems like the more relevant problem is just the object-level beliefs of those joining: how impactful is your work versus how much it accelerates capabilities, along with vague non-utilitarian desires like status that bias people towards joining the big labs.
I realize I’m being a little pedantic here, but on the “joke” calculation: the problem here is that PR is a binary function depending on whether k utilitarians join or not, right? For instance, let −sPR be the effective safety premium from k safety-minded utilitarians joining (the value being negative as joining presumably accelerates the company), and suppose that each utilitarian joining leads to −sPR/k acceleration. Then a rational utilitarian would demand (ϵ+sPR)/k premium, which is not negligible.
Going back to the joke calculation, it implies that the bottleneck to preventing defection is coordination: k utilitarians acting together would not join for ϵ/k value as it is against their interests, but individually they have zero counterfactual impact, so they all join. In the real world, coordination is plausibly relevant, but it seems like the more relevant problem is just the object-level beliefs of those joining: how impactful is your work versus how much it accelerates capabilities, along with vague non-utilitarian desires like status that bias people towards joining the big labs.