People concerned about AI safety sometimes withhold information or even mislead people in order to prevent ideas from spreading that might accelerate AI capabilities. While I think this may often be necessary, this mindset sometimes feels counterproductive or icky to me. Trying to distill some pieces of this intuition...
AI safety people are all trying to achieve the same goal (stopping AI from destroying the world), whereas individual AI capabilities researchers largely benefit from keeping their work secret until it’s published to avoid being “scooped.” People working on safety benefit more from sharing information amongst themselves than those working on capabilities, so we should take advantage of it.
Capabilities researchers are mostly trying to come up with technical solutions that locally improve AI performance—often, capabilities advances will come down to one researcher’s simple technique. But people working on safety must think about the myriad effects of their actions on the world.
The idea that “capability X could exist” might inadvertently accelerate capabilities, because it causes researchers to look for ways to implement X. These techniques might sometimes be complementary, but they are likely to be orthogonal, mutually exclusive, or duplicates of each other.
But the positive impact of the idea to safety is likely larger. X may have many strategic implications, affecting the whole causal graph of AI’s impact on the world. If people working on AI safety were aware of these implications, they might realize that their work is more/less likely to be important than they previously believed, and many of them might make small or large changes to their plans in order to prepare for X.
It’s difficult for one person to think through the implications of their ideas, but an entire community thinking about it will do a lot better. E.g. someone might think that their idea may be infohazardous, but if they posted about it publicly, someone else might:
Show that it isn’t hazardous after all
Notice an additional effect that flips the sign of the impact
Find a way to defuse the hazard, so that when a capabilities researcher implements the scary thing, we already have a fix
Information-hiding and deception reduce trust within the AI safety community and makes it look less trustworthy to the rest of the world. It gives off the vibes that everything AI safety people say is heavily biased by their agenda, that they treat people outside their insular group as the enemy, or that they paternalistically hide information from anyone who doesn’t share their beliefs because they “can’t handle the truth.”
I want to show off my interesting ideas and be perceived as smart, and it feels nicer to cooperate with people rather than misleading them. I try to separate these from the more-important considerations above.
The tradeoff between secrecy and openness partly depends on how many people hanging out in places like LessWrong work on capabilities vs. safety. My impression is that there are a lot more safety people, but I’m very uncertain about this.
People concerned about AI safety sometimes withhold information or even mislead people in order to prevent ideas from spreading that might accelerate AI capabilities. While I think this may often be necessary, this mindset sometimes feels counterproductive or icky to me. Trying to distill some pieces of this intuition...
AI safety people are all trying to achieve the same goal (stopping AI from destroying the world), whereas individual AI capabilities researchers largely benefit from keeping their work secret until it’s published to avoid being “scooped.” People working on safety benefit more from sharing information amongst themselves than those working on capabilities, so we should take advantage of it.
Capabilities researchers are mostly trying to come up with technical solutions that locally improve AI performance—often, capabilities advances will come down to one researcher’s simple technique. But people working on safety must think about the myriad effects of their actions on the world.
The idea that “capability X could exist” might inadvertently accelerate capabilities, because it causes researchers to look for ways to implement X. These techniques might sometimes be complementary, but they are likely to be orthogonal, mutually exclusive, or duplicates of each other.
But the positive impact of the idea to safety is likely larger. X may have many strategic implications, affecting the whole causal graph of AI’s impact on the world. If people working on AI safety were aware of these implications, they might realize that their work is more/less likely to be important than they previously believed, and many of them might make small or large changes to their plans in order to prepare for X.
It’s difficult for one person to think through the implications of their ideas, but an entire community thinking about it will do a lot better. E.g. someone might think that their idea may be infohazardous, but if they posted about it publicly, someone else might:
Show that it isn’t hazardous after all
Notice an additional effect that flips the sign of the impact
Find a way to defuse the hazard, so that when a capabilities researcher implements the scary thing, we already have a fix
Information-hiding and deception reduce trust within the AI safety community and makes it look less trustworthy to the rest of the world. It gives off the vibes that everything AI safety people say is heavily biased by their agenda, that they treat people outside their insular group as the enemy, or that they paternalistically hide information from anyone who doesn’t share their beliefs because they “can’t handle the truth.”
I want to show off my interesting ideas and be perceived as smart, and it feels nicer to cooperate with people rather than misleading them. I try to separate these from the more-important considerations above.
The tradeoff between secrecy and openness partly depends on how many people hanging out in places like LessWrong work on capabilities vs. safety. My impression is that there are a lot more safety people, but I’m very uncertain about this.