Could you explain which AI companies[1] you considered and how could joining such a company increase x-risk as opposed to the options where you are not in frontier lab safety teams? My baseline extinction scenario is that OAI/Anthropic/GDM/a union thereof creates an unaligned AI which escapes before deployment, after deployment or is trusted to create a successor. The safety team’s role is to develop interpretability techniques and to find evidence for misalignment before the AI becomes capable of harming mankind. I don’t see a way for you to decrease P(finding evidence) unless your potential rival was more competent. Alas, I also struggle to understand how one can estimate P(rival was more conpetent).
The specific company I was considering in this example was GDM, but I see Anthropic as broadly similar and OAI as somewhat worse. That’s why I didn’t name it—there was nothing in my logic that was specific to GDM.
If I put on my pessimist hat, here are the ways where this could increase extinction risk counterfactually:
The work I do winds up being used for capabilities, and advances capabilities more than it advances safety.
The work I do is largely safetywashing or focused on near-term risks that impact company reputation and/or profit, which has a much smaller or negligible effect on extinction risk. This isn’t strictly worse than doing nothing, but my choices aren’t “Do nothing” or “Join a frontier lab”.
I end up getting my epistemics changed, falsely decide that alignment isn’t that hard actually, and then probably end up doing one of the other two points. (Especially re: Point 1, the harder you think alignment is, the less value you should put on frontier company work imo. I can go into more detail on why I think this, if it is non-obvious.)
By contrast, the counterfactual thing I could do is to potentially do other useful work that might reduce extinction risk more. My alternate path, which is the one I currently estimate I’m very likely to take (just waiting on security clearance) is to join the new Australian AISI and try to shape their research direction in a way that helps with extinction risk. The expected value of this is well above zero in my book, so it’s not just enough to avoid increasing extinction risk. I also have to decrease extinction risk more at a frontier lab than anywhere else I could otherwise be working.
And since that’s really hard to calculate, it’s easy to wind up doing the things that follow money and status gradients as long as they have a plausible path to impact. Committing to giving away any extra money is a good way of taking away one of those incentives. The money could do a lot of good if donated, but that’s an abstract good, it doesn’t emotionally move me in the way that a large salary clearly does. I don’t endorse this and I am not proud of this, but it would be far worse if I denied it to myself.
To state the obvious (but that I don’t think you stated explicitly so far): If your judgment is that there are money-bottlenecked initiatives that you could un-bottleneck with an amount like $[GDM AI safety minus Oz AISI] (and that likely will be un-bottlenecked if you don’t do this), and your estimated counterfactual impact is comparable in both cases, this might tip the decision strongly in favor of GDM AI safety.
Could you explain which AI companies[1] you considered and how could joining such a company increase x-risk as opposed to the options where you are not in frontier lab safety teams? My baseline extinction scenario is that OAI/Anthropic/GDM/a union thereof creates an unaligned AI which escapes before deployment, after deployment or is trusted to create a successor. The safety team’s role is to develop interpretability techniques and to find evidence for misalignment before the AI becomes capable of harming mankind. I don’t see a way for you to decrease P(finding evidence) unless your potential rival was more competent. Alas, I also struggle to understand how one can estimate P(rival was more conpetent).
Unless, of course, they are xAI or Meta which I’d rather see destroyed.
The specific company I was considering in this example was GDM, but I see Anthropic as broadly similar and OAI as somewhat worse. That’s why I didn’t name it—there was nothing in my logic that was specific to GDM.
If I put on my pessimist hat, here are the ways where this could increase extinction risk counterfactually:
The work I do winds up being used for capabilities, and advances capabilities more than it advances safety.
The work I do is largely safetywashing or focused on near-term risks that impact company reputation and/or profit, which has a much smaller or negligible effect on extinction risk. This isn’t strictly worse than doing nothing, but my choices aren’t “Do nothing” or “Join a frontier lab”.
I end up getting my epistemics changed, falsely decide that alignment isn’t that hard actually, and then probably end up doing one of the other two points. (Especially re: Point 1, the harder you think alignment is, the less value you should put on frontier company work imo. I can go into more detail on why I think this, if it is non-obvious.)
By contrast, the counterfactual thing I could do is to potentially do other useful work that might reduce extinction risk more. My alternate path, which is the one I currently estimate I’m very likely to take (just waiting on security clearance) is to join the new Australian AISI and try to shape their research direction in a way that helps with extinction risk. The expected value of this is well above zero in my book, so it’s not just enough to avoid increasing extinction risk. I also have to decrease extinction risk more at a frontier lab than anywhere else I could otherwise be working.
And since that’s really hard to calculate, it’s easy to wind up doing the things that follow money and status gradients as long as they have a plausible path to impact. Committing to giving away any extra money is a good way of taking away one of those incentives. The money could do a lot of good if donated, but that’s an abstract good, it doesn’t emotionally move me in the way that a large salary clearly does. I don’t endorse this and I am not proud of this, but it would be far worse if I denied it to myself.
To state the obvious (but that I don’t think you stated explicitly so far): If your judgment is that there are money-bottlenecked initiatives that you could un-bottleneck with an amount like $[GDM AI safety minus Oz AISI] (and that likely will be un-bottlenecked if you don’t do this), and your estimated counterfactual impact is comparable in both cases, this might tip the decision strongly in favor of GDM AI safety.