Epistemic status: not a lawyer, but I’ve worked with a lot of them.
As I understand it, an NDA isn’t enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...
I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.
(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country’s statements about the year they’re gonna do a war at surface level)
“want to pick a war with America” is really strange wording because China’s strategic goals are not “win a war against nuclear-armed America”, but things like “be able to control its claims in the South China Sea including invading Taiwan without American interference”. Likewise Russia doesn’t want to “pick a war with the EU” but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don’t know whether national security people relate to the phrasing the same way but they do understand this.
I totally understand your point, agree that many folks would use your phrasing, and nonetheless think there is something uniquely descriptively true about the phrasing I chose and I stand by it.
Has China has made a statment about starting a war in 2027 or later? Who exactly is the belief that “by-default China will want to pick a war with America in 2027 or later” held by and how confident are you that they hold it?
Thanks for the link! The one mention of starting war was a quote from this 2006 white paper:
“by the middle of the twenty-first century, the strategic goal of building an informatized army and winning informatized wars will be basically achieved”
Is this what you’re referring to or did I miss something?
Sufficient AI superiority will mean overwhelming military superiority. If we remain ahead in AI it won’t matter what other countries do. I expect this effect will dominate the strategic landscape by 2027.
No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027.
China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.
A random observation from a think tank event last night in DC—the average person in those rooms is convinced there’s a problem, but that it’s the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I’m not sure what to make of it, honestly.
Random psychologizing explanation that resonates most with me: Claiming to address big problems requires high-status. A low-rank person is allowed to bring up minor issues, but they are not in a position to bring up big issues that might reflect on the status of many high-status people.
This is a pretty common phenomenon that I’ve observed. Many people react with strong social slap-down motions if you (for example) call in question whether the net-effect of a whole social community or economic sector is negative, where the underlying cognitive reality seems similar to “you are not high status enough to bring forward this grievance”.
But I also think there’s a separate piece—I observe, with pretty high odds that it isn’t just an act, that at least some people are trying to associate themselves with the near-term harms and AI ethics stuff because they think that is the higher-status stuff, despite direct obvious evidence that the highest-status people in the room disagree.
There are (at least) two models which could partially explain this: 1) The high-status/high-rank people have that status because they’re better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it’ll come out OK without their involvement.
2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves.
edit: to clarify, these are two models that do NOT imply the obvious “smarter/more powerful people are correctly worried about the REAL threats, and the average person’s concerns are probably unimportant/uninformed”. It’s quite possible that this division doesn’t tell us much about the relative importance of those different risks.
Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don’t think I have anywhere near enough evidence yet to actually conclude that, so I’m just reporting the random observation for now :)
I really dislike the term “warning shot,” and I’m trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn’t the right term “mass casualty event” ?
I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn’t involve mass casualties.
I do agree with what I think you are pointin at, which is that there is something Orwellian about the “warning shot” language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).
I totally think it’s true that there are warning shots that would be non-mass-casualty events, to be clear, and I agree that the scenarios you note could maybe be those.
(I was trying to use “plausibly” to gesture at a wide range of scenarios, but I totally agree the comment as written isn’t clearly meaning that).
I don’t think folks intended anything Orwellian, just sort of something we stumbled into, and heck, if we can both be less Orwellian and be more compelling policy advocates at the same time, why not, I figure.
I think a lot of people losing their jobs would probably do the trick, politics-wise. For most people the crux is “willAIs will be more capable than humans”, not “mightAIs more capable than humans be dangerous”.
You know, you’re not the first person to make that argument to me recently. I admit that I find it more persuasive than I used to.
Put another way: “will AI take all the jobs” is another way of saying* “will I suddenly lose the ability to feed and protect those I love.” It’s an apocalypse in microcosm, and it’s one that doesn’t require a lot of theory to grasp.
*Yes, yes, you could imagine universal basic income or whatever. Do you think the average person is Actually Expecting to Get That ?
I’m pretty sure that I think “infohazard” is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept—it doesn’t go anywhere. Ok, the information’s harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?
That “so now what” doesn’t sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.
I agree that it’s not terribly useful beyond identifying someone’s fears. Using almost any taxonomy to specify what the speaker is actually worried about lets you stop saying “infohazard” and start talking about “bad actor misuse of information” or “naive user tricked by partial (but true) information”. These ARE often useful, even though the aggregate term “infohazard” is limited.
Yeah, that’s a useful taxonomy to be reminded of. I think it’s interesting how the “development hazard”, item 8, with maybe a smidge of “adversary hazard”, is the driver of people’s thinking on AI. I’m pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!
I suggest there is a concept distinct enough to warrant the special term, but if it’s expansive enough to include secrets, beneficial informationthat some people prefer others not know, that renders it worthless.
“Infohazard” ought to be reserved for information that harms the mind that contains it, with spoilers as the most mild examples, SCP-style horrors as the extreme fictional examples.
I think within a bayesian framework where in-general you assume information has positive value, it’s useful to have an explicit term when that is not the case. It’s a relatively rare occurrence, and as such your usual ways of dealing with information will probably backfire.
The obvious things to do is to not learn about that information in the first place (i.e. avoid dangerous research), understand and address the causes for why this information is dangerous (because e.g. you can’t coordinate on not building dangerous technology), or as a last resort, silo the information and limit the spread of it.
I do think that it would be useful to have different words that distinguish between “infohazard to the average individual” and “societal infohazard”. The first one is really exceedingly rare. The second one is still rare but more common because society has a huge distribution of beliefs and enough crazy people that if information can be used dangerously, there is a non-trivial chance it will.
I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot. Arguably, AI is the weird counterexample of a thought that wants to be thunk—I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion.
I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it. And, yeah, sometimes they suffer real damage from it. There’s no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that’s the more-common varietal of infohazard I’m thinking of.
Ok, the information’s harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?
I think one of the points is that you should now focus on selective rather than corrective or structural means to figure out who is nonetheless allowed to work on the basis of this information.
Calling something an infohazard, at least in my thinking, generally implies both that:
any attempts to devise galaxy-brained incentive structures that try to get large groups of people to nonetheless react in socially beneficial ways when they access this information are totally doomed and should be scrapped from the beginning.
you absolutely should not give this information to anyone that you have doubts would handle it well; musings along the lines of “but maybe I can teach/convince them later on what the best way to go about this is” are generally wrong and should also be dismissed.
So what do you do if you nonetheless require that at least some people are keeping track of things? Well, as I said above, you use selective methods instead. More precisely, you carefully curate a very short list of human beings that are responsible people and likely also share your meta views on how dangerous truths ought to be handled, and you do your absolute best to make sure the group never expands beyond those you have already vetted as capable of handling the situation properly.
I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards. For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can’t do that… you need other mitigating controls, instead.
I feel like one of the trivially most obvious signs that AI safety comms hasn’t gone actually mainstream yet is that we don’t say, “yeah, superintelligent AI is very risky. No, I don’t mean Terminator. I’m thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?”
In this context, it’s actually kind of funny that (at least the latter half of) Person of Interest is explicitly about a misaligned superintelligent AI, which is misaligned because its creator did not take all the necessary safety precautions in building it (as opposed to one of the main characters, who did). Well, technically it’s mostly intent-aligned; it’s just not value-aligned. But still… And although it’s mostly just misuse risks, there still is a strong component of just how difficult it is to defend the world from such AGI-caused threats.
Root in Season 2 is also kind-of just a more cynical and misandrist version of Larry Page, talking about AIs as the “successor species” to humanity and that us “bad apples” should give way to something more intelligent and pure.
I’ll be at LessOnline this upcoming weekend—would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.
Hypothesis, super weakly held and based on anecdote: One big difference between US national security policy people and AI safety people is that the “grieving the possibility that we might all die” moment happened, on average, more years ago for the national security policy person than the AI safety person.
This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the “oh, what happens if we get nuked by Russia” moment in their careers in the Literal 1980s...
Epistemic status: not a lawyer, but I’ve worked with a lot of them.
As I understand it, an NDA isn’t enforceable against a subpoena (though the former employer can seek a protective order for the testimony). Someone should really encourage law enforcement or Congress to subpoena the OpenAI resigners...
A subpoena for what?
I think one thing that is poorly-understood by many folks outside of DC is just how baseline the assumption is that China is a by-default faithless negotiating partner and that by-default China will want to pick a war with America in 2027 or later.
(I am reporting, not endorsing. For example, it is deeply unclear to me why we should take another country’s statements about the year they’re gonna do a war at surface level)
“want to pick a war with America” is really strange wording because China’s strategic goals are not “win a war against nuclear-armed America”, but things like “be able to control its claims in the South China Sea including invading Taiwan without American interference”. Likewise Russia doesn’t want to “pick a war with the EU” but rather annex Ukraine; if they were stupid enough to want the former they would have just bombed Paris. I don’t know whether national security people relate to the phrasing the same way but they do understand this.
I totally understand your point, agree that many folks would use your phrasing, and nonetheless think there is something uniquely descriptively true about the phrasing I chose and I stand by it.
Has China has made a statment about starting a war in 2027 or later? Who exactly is the belief that “by-default China will want to pick a war with America in 2027 or later” held by and how confident are you that they hold it?
It is supposedly their goal for when they will have modernized their military.
Thanks for the link! The one mention of starting war was a quote from this 2006 white paper:
Is this what you’re referring to or did I miss something?
The general belief in Washington is that Xi Jinping has ordered his military to be ready to invade Taiwan by then. (See, e.g., https://www.reuters.com/world/china/logistics-war-how-washington-is-preparing-chinese-invasion-taiwan-2024-01-31/ )
Sufficient AI superiority will mean overwhelming military superiority. If we remain ahead in AI it won’t matter what other countries do. I expect this effect will dominate the strategic landscape by 2027.
Say more ?
No, the belief is that China isn’t going to start a war before it has a modernized military, and they plan to have a modernized military by 2027. Therefore they won’t start a war before 2027.
China has also been drooling over Taiwan for the past 100 years. Thus, if you don’t think diplomatic or economic ties mean much to them, and they’ll contend with the US’s military might before 2027, and neither party will use nukes in such a conflict, then you expect a war after 2027.
Ah, I misread your comment. Thanks for clarifying!
I don’t think they have stated they’ll to to war after 2027. 2027 is the year of their “military modernization” target.
A random observation from a think tank event last night in DC—the average person in those rooms is convinced there’s a problem, but that it’s the near-term harms, the AI ethics stuff, etc. The highest-status and highest-rank people in those rooms seem to be much more concerned about catastrophic harms.
This is a very weird set of selection effects. I’m not sure what to make of it, honestly.
Random psychologizing explanation that resonates most with me: Claiming to address big problems requires high-status. A low-rank person is allowed to bring up minor issues, but they are not in a position to bring up big issues that might reflect on the status of many high-status people.
This is a pretty common phenomenon that I’ve observed. Many people react with strong social slap-down motions if you (for example) call in question whether the net-effect of a whole social community or economic sector is negative, where the underlying cognitive reality seems similar to “you are not high status enough to bring forward this grievance”.
I think this is plausibly describing some folks!
But I also think there’s a separate piece—I observe, with pretty high odds that it isn’t just an act, that at least some people are trying to associate themselves with the near-term harms and AI ethics stuff because they think that is the higher-status stuff, despite direct obvious evidence that the highest-status people in the room disagree.
There are (at least) two models which could partially explain this:
1) The high-status/high-rank people have that status because they’re better at abstract and long-term thinking, and their role is more toward preventing catastrophe rather than nudging toward improvements. They leave the lesser concerns to the underlings, with the (sometimes correct) belief that it’ll come out OK without their involvement.
2) The high-status/high-rank people are rich and powerful enough to be somewahat insulated from most of the prosaic AI risks, while the average member can legitimately be hurt by such things. So everyone is just focusing on the things most likely to impact themselves.
edit: to clarify, these are two models that do NOT imply the obvious “smarter/more powerful people are correctly worried about the REAL threats, and the average person’s concerns are probably unimportant/uninformed”. It’s quite possible that this division doesn’t tell us much about the relative importance of those different risks.
Yup1 I think those are potentially very plausible, and similar things were on my short list of possible explanations. I would be very not shocked if those are the true reasons. I just don’t think I have anywhere near enough evidence yet to actually conclude that, so I’m just reporting the random observation for now :)
I really dislike the term “warning shot,” and I’m trying to get it out of my vocabulary. I understand how it came to be a term people use. But, if we think it might actually be something that happens, and when it happens, it plausibly and tragically results the deaths of many folks, isn’t the right term “mass casualty event” ?
I think many mass casualty events would be warning shots, but not all warning shots would be mass casualty events. I think an agentic AI system getting most of the way towards escaping containment or a major fraud being perpetrated by an AI system would both be meaningful warning shots, but wouldn’t involve mass casualties.
I do agree with what I think you are pointin at, which is that there is something Orwellian about the “warning shot” language. Like, in many of these scenarios we are talking about large negative consequences, and it seems good to have a word that owns that (in-particular in as much as people are thinking about making warning shots more likely before an irrecoverable catastrophe occurs).
I totally think it’s true that there are warning shots that would be non-mass-casualty events, to be clear, and I agree that the scenarios you note could maybe be those.
(I was trying to use “plausibly” to gesture at a wide range of scenarios, but I totally agree the comment as written isn’t clearly meaning that).
I don’t think folks intended anything Orwellian, just sort of something we stumbled into, and heck, if we can both be less Orwellian and be more compelling policy advocates at the same time, why not, I figure.
I think a lot of people losing their jobs would probably do the trick, politics-wise. For most people the crux is “will AIs will be more capable than humans”, not “might AIs more capable than humans be dangerous”.
You know, you’re not the first person to make that argument to me recently. I admit that I find it more persuasive than I used to.
Put another way: “will AI take all the jobs” is another way of saying* “will I suddenly lose the ability to feed and protect those I love.” It’s an apocalypse in microcosm, and it’s one that doesn’t require a lot of theory to grasp.
*Yes, yes, you could imagine universal basic income or whatever. Do you think the average person is Actually Expecting to Get That ?
I’m pretty sure that I think “infohazard” is a conceptual dead-end concept that embeds some really false understandings of how secrets are used by humans. It is an orphan of a concept—it doesn’t go anywhere. Ok, the information’s harmful. You need humans to touch that info anyways to do responsible risk-mitigation. So now what ?
That “so now what” doesn’t sound like a dead end to me. The question of how to mitigate risk when normal risk-mitigation procedures are themselves risky seems like an important one.
I agree that it’s not terribly useful beyond identifying someone’s fears. Using almost any taxonomy to specify what the speaker is actually worried about lets you stop saying “infohazard” and start talking about “bad actor misuse of information” or “naive user tricked by partial (but true) information”. These ARE often useful, even though the aggregate term “infohazard” is limited.
See e.g. Table 1 of https://nickbostrom.com/information-hazards.pdf
Yeah, that’s a useful taxonomy to be reminded of. I think it’s interesting how the “development hazard”, item 8, with maybe a smidge of “adversary hazard”, is the driver of people’s thinking on AI. I’m pretty unconvinced that good infohazard doctrine, even for AI, can be written based on thinking mainly about that!
I suggest there is a concept distinct enough to warrant the special term, but if it’s expansive enough to include secrets, beneficial information that some people prefer others not know, that renders it worthless.
“Infohazard” ought to be reserved for information that harms the mind that contains it, with spoilers as the most mild examples, SCP-style horrors as the extreme fictional examples.
I think within a bayesian framework where in-general you assume information has positive value, it’s useful to have an explicit term when that is not the case. It’s a relatively rare occurrence, and as such your usual ways of dealing with information will probably backfire.
The obvious things to do is to not learn about that information in the first place (i.e. avoid dangerous research), understand and address the causes for why this information is dangerous (because e.g. you can’t coordinate on not building dangerous technology), or as a last resort, silo the information and limit the spread of it.
I do think that it would be useful to have different words that distinguish between “infohazard to the average individual” and “societal infohazard”. The first one is really exceedingly rare. The second one is still rare but more common because society has a huge distribution of beliefs and enough crazy people that if information can be used dangerously, there is a non-trivial chance it will.
I still like the term “recipe for destruction” when limiting it to stuff similar to dangerous technology.
I think a lot of my underlying instinctive opposition to this concept boils down to thinking that we can and do coordinate on this stuff quite a lot. Arguably, AI is the weird counterexample of a thought that wants to be thunk—I think modern Western society is very nearly tailor-made to seek a thing that is abstract, maximizing, systematizing of knowledge, and useful, especially if it fills a hole left by the collapse of organized religion.
I think for most other infohazards, the proper approach requires setting up an (often-government) team that handles them, which requires those employees to expose themselves to the infohazard to manage it. And, yeah, sometimes they suffer real damage from it. There’s no way to analyze ISIS beheading videos to stop their perpetrators without seeing some beheading videos; I think that’s the more-common varietal of infohazard I’m thinking of.
I think one of the points is that you should now focus on selective rather than corrective or structural means to figure out who is nonetheless allowed to work on the basis of this information.
Calling something an infohazard, at least in my thinking, generally implies both that:
any attempts to devise galaxy-brained incentive structures that try to get large groups of people to nonetheless react in socially beneficial ways when they access this information are totally doomed and should be scrapped from the beginning.
you absolutely should not give this information to anyone that you have doubts would handle it well; musings along the lines of “but maybe I can teach/convince them later on what the best way to go about this is” are generally wrong and should also be dismissed.
So what do you do if you nonetheless require that at least some people are keeping track of things? Well, as I said above, you use selective methods instead. More precisely, you carefully curate a very short list of human beings that are responsible people and likely also share your meta views on how dangerous truths ought to be handled, and you do your absolute best to make sure the group never expands beyond those you have already vetted as capable of handling the situation properly.
I think at the meta level I very much doubt that I am responsible enough to create and curate a list of human beings for the most dangerous hazards. For example, I am very confident that I could not 100% successfully detect a foreign government spy inside my friend group, because even the US intelligence community can’t do that… you need other mitigating controls, instead.
I feel like one of the trivially most obvious signs that AI safety comms hasn’t gone actually mainstream yet is that we don’t say, “yeah, superintelligent AI is very risky. No, I don’t mean Terminator. I’m thinking more Person of Interest, you know, that show with the guy from the Sound of Freedom and the other guy who was on Lost and Evil?”
I agree (minor spoilers below).
In this context, it’s actually kind of funny that (at least the latter half of) Person of Interest is explicitly about a misaligned superintelligent AI, which is misaligned because its creator did not take all the necessary safety precautions in building it (as opposed to one of the main characters, who did). Well, technically it’s mostly intent-aligned; it’s just not value-aligned. But still… And although it’s mostly just misuse risks, there still is a strong component of just how difficult it is to defend the world from such AGI-caused threats.
Root in Season 2 is also kind-of just a more cynical and misandrist version of Larry Page, talking about AIs as the “successor species” to humanity and that us “bad apples” should give way to something more intelligent and pure.
(This is not an endorsement of Jim Caviezel’s beliefs, in case anyone somehow missed my point here.)
Why are we so much more worried about LLMs having CBRN risk than super-radicalization risk, precisely ?
(or is this just a expected-harm metric rather than a probability metric ?)
I am (speaking personally) pleasantly surprised by Anthropic’s letter. https://cdn.sanity.io/files/4zrzovbb/website/6a3b14a98a781a6b69b9a3c5b65da26a44ecddc6.pdf
I’ll be at LessOnline this upcoming weekend—would love to talk to folks about what things they wish someone would write about to explain how DC policy stuff and LessWrong-y themes could be better connected.
Hypothesis, super weakly held and based on anecdote:
One big difference between US national security policy people and AI safety people is that the “grieving the possibility that we might all die” moment happened, on average, more years ago for the national security policy person than the AI safety person.
This is (even more weakly held) because the national security field has existed for longer, so many participants literally had the “oh, what happens if we get nuked by Russia” moment in their careers in the Literal 1980s...