I think decision-making relies on counterfactuals, and it also seems good to have a sense of what counterfactuals we’re considering here, or what the underlying model of AGI production is.
I think the main thing that I’m hoping for is a separation out of ‘purity ethics’ and ‘expected value maximization’, or something, or the purity ethics is explicitly grounded in a FDT-style “if people running good decision theory simply decline to coordinate with evil systems, those evil systems will be disadvantaged” rather than just a “this seems icky.” For example, you might imagine researchers or engineers signing something like an “adequacy pledge” based off of Eliezer’s guidelines, where they commit to only working to orgs that are adequate along all six dimensions (or to the top current org if no orgs are adequate, tho that version seems worse), and now orgs can see how much talent cares about that sort of thing.
I wrote some basic thoughts about this a while ago in response to a similar question; roughly speaking, I think the ‘total effects’ of interventions are not obvious even if the ‘direct effects’ are obvious. I think there’s value in trying to figure out why people are motivated and what sorts of ‘gains from trade’ are possible.
As a specific example, DeepMind has made public claims of the form “once we get close enough to AGI, we’ll slow down and move carefully.” But imagine being the person at DeepMind in charge of making the lever such that DeepMind leadership can pull it to slow down, without immediately shedding all of their employees that want to keep moving at full speed, or other sorts of evaporative cooling. What work can be done now to set that up? What work can be done now to figure out the benefits and costs of pulling that lever at various times? [And, as a “checking if people’s words line up with their behaviors”, at a company DeepMind’s size there really needs to be at least one FTE who is preparing for a project of that size. Does such a person exist? Can we help DeepMind hire them if not so?]
As a specific example, I want to separate out something like “working at OpenAI” and “founding OpenAI”; my sense is that EY and others think that founding OpenAI in the first place did a major blow to the feasibility of coordination between AGI projects, and thus was pretty tragic from the “will human civilization make it” perspective. But it’s not obvious to me that choosing to get a job at OpenAI in 2022 is primarily connected to the question of whether or not OpenAI should be founded in 2015, and is instead primarily connected to the question of what projects they will do in 2022-2025. [The linked writing in the parent comment is about OpenAI in 2020, before ARC and Anthropic split out of it; I don’t have a great sense of the social value (or disvalue) of working at those three orgs today.]
I think decision-making relies on counterfactuals, and it also seems good to have a sense of what counterfactuals we’re considering here, or what the underlying model of AGI production is.
I think the main thing that I’m hoping for is a separation out of ‘purity ethics’ and ‘expected value maximization’, or something, or the purity ethics is explicitly grounded in a FDT-style “if people running good decision theory simply decline to coordinate with evil systems, those evil systems will be disadvantaged” rather than just a “this seems icky.” For example, you might imagine researchers or engineers signing something like an “adequacy pledge” based off of Eliezer’s guidelines, where they commit to only working to orgs that are adequate along all six dimensions (or to the top current org if no orgs are adequate, tho that version seems worse), and now orgs can see how much talent cares about that sort of thing.
I wrote some basic thoughts about this a while ago in response to a similar question; roughly speaking, I think the ‘total effects’ of interventions are not obvious even if the ‘direct effects’ are obvious. I think there’s value in trying to figure out why people are motivated and what sorts of ‘gains from trade’ are possible.
As a specific example, DeepMind has made public claims of the form “once we get close enough to AGI, we’ll slow down and move carefully.” But imagine being the person at DeepMind in charge of making the lever such that DeepMind leadership can pull it to slow down, without immediately shedding all of their employees that want to keep moving at full speed, or other sorts of evaporative cooling. What work can be done now to set that up? What work can be done now to figure out the benefits and costs of pulling that lever at various times? [And, as a “checking if people’s words line up with their behaviors”, at a company DeepMind’s size there really needs to be at least one FTE who is preparing for a project of that size. Does such a person exist? Can we help DeepMind hire them if not so?]
As a specific example, I want to separate out something like “working at OpenAI” and “founding OpenAI”; my sense is that EY and others think that founding OpenAI in the first place did a major blow to the feasibility of coordination between AGI projects, and thus was pretty tragic from the “will human civilization make it” perspective. But it’s not obvious to me that choosing to get a job at OpenAI in 2022 is primarily connected to the question of whether or not OpenAI should be founded in 2015, and is instead primarily connected to the question of what projects they will do in 2022-2025. [The linked writing in the parent comment is about OpenAI in 2020, before ARC and Anthropic split out of it; I don’t have a great sense of the social value (or disvalue) of working at those three orgs today.]