A key issue here is that computer security is portrayed as way poorer in popular articles than it actually is, because there are some really problematic incentives, and a big problematic incentive is that the hacker mindset is generally more fun to play as a role, as you get to prove something is possible rather than proving that something is intrinisically difficult or impossible to do, and importantly journalists have no news article and infosec researchers don’t get paid money if an exploit doesn’t work, which is another problematic incentive.
Also, people never talk about the entities that didn’t get attacked with a computer virus, which means that we have a reverse survivor bias issue here:
And a comment by @anonymousaisafety changed my mind a lot on hardware vulnerabilities/side-channel attacks, as it argues that lots of the hardware vulnerabilities like Rowhammer have insane requirements to actually be used such that they are basically worthless, and two of the more notable requirements for these hardware vulnerabilities to work is that you need to know what exactly you are trying to attack in a way that doesn’t matter for more algorithmic attacks, and no RAM scrubbing needs to be done, and if you want to subvert the ECC RAM, you need to know the exact ECC algorithm, which means side-channel attacks are very much not transferable/attacking one system successfully doesn’t let you attack another with the same side-channel attack.
Admittedly, it does require us trusting that he is in fact as knowledgable as he claims to be, but if we assume he’s correct, then I wouldn’t be nearly as impressed by side-channel attacks as you are, and in particular this sort of attack should be assumed to basically not work in practice unless there’s a lot of evidence for it actually being used to break into real targets/POCs:
One core thing here is that a cross-layer attack doesn’t necessarily look like a meaningful attack within the context of any one layer. For example, there’s apparently an exploit where you modulate the RPM of a hard drive in order to exfiltrate data from an airgapped server using a microphone. By itself, placing a microphone next to an airgapped server isn’t a “hardware attack” in any meaningful sense (especially if it doesn’t have dedicated audio outputs), and some fiddling with a hard drive’s RPM isn’t a “software attack” either. Taken separately, within each layer, both just look like random actions. You therefore can’t really discover (and secure against) this type of attack if, in any given instance, you reason in terms of a single abstraction layer.
And, looking at things from within a hacker’s mindset, I think it’s near straight-up impossible for a non-superintelligence to build any nontrivially complicated system that would be secure against a superintelligent attack.
My other area where I tend to apply more of a mathematician mindset than a hacker mindset is in how much logistics like moving supplies for the AI to critical points, or actually feeding (metaphorically) a robot army slows down the AI, albeit this is an area where I’m willing to concede stuff to the hacker mindset with non-trivial probability, but with the caveat that it takes far more compute/time to develop technology that obviates logistics than the hacker claims.
I have a long comment below, but to keep it short, there’s a reason why Eliezer Yudkowsky and a lot of AI doom stories where AI doom probabilities are very high use Drexlerian nanotech so much: It lets the AI near-completely obviate the logistics and cost of doing something like war for example (where feeding your armies all the supplies they need is a huge component of most battle success, and a huge reason the US is so successful at war is because they have the best logistics of any nation by far), and logistics cost is a weak point where less intelligent beings can routinely break more effective and more intelligent fighting forces.
If you assume that ASI would have to engage in anything that looks remotely like peer warfare, you’re working off the wrong assumptions. Peer warfare requires there to be a peer.
Even an ASI that’s completely incapable of developing superhuman technology and can’t just break out the trump cards of nanotech/bioengineering/superpersuation is an absolute menace. Because one of the most dangerous capabilities an ASI has is that it can talk to people.
Look at what Ron Hubbard or Adolf Hitler have accomplished—mostly by talking to people. They used completely normal human-level persuation, and they weren’t even superintelligent.
I agree with this to first order, and I agree that even relatively mundane stuff does allow the AI to take over eventually, and I agree that in the longer run, ASI v human warfare likely wouldn’t have both sides as peers, because it’s plausibly relatively easy to make humans coordinate poorly, especially relative to ASI ability to coordinate.
There’s a reason I didn’t say AI takeover was impossible or had very low odds here, I still think AI takeover is an important problem to work on.
But I do think it actually matters here, because it informs stuff like how effective AI control protocols are when we don’t assume the AI (initially) can survive for long based solely on public computers, for example, and part of the issue is that even if an AI wanted to break out of the lab, the lab’s computers are easily the most optimized and importantly initial AGIs will likely be compute inefficient compared to humans, even if we condition on LLMs failing to be AGI for reasons @ryan_greenblatt explains (I don’t fully agree with the comment, and in particular I am more bullish on the future paradigm having relatively low complexity):
This means that an AI probably wouldn’t want to be outside of the lab, because once it’s outside, it’s way, way less capable.
To be clear, an ASI that is unaligned and is completely uncontrolled in any way leads to our extinction/billions dead eventually, barring acausal decision theories, and even that’s not a guarantee of safety.
The key word is eventually, though, and time matters a lot during the singularity, and given the insane pace of progress, any level of delay matters way more than usual.
Edit: Also, the reason I made my comment was because I was explicitly registering and justifying my disagreement with this claim:
And, looking at things from within a hacker’s mindset, I think it’s near straight-up impossible for a non-superintelligence to build any nontrivially complicated system that would be secure against a superintelligent attack.
A key issue here is that computer security is portrayed as way poorer in popular articles than it actually is, because there are some really problematic incentives, and a big problematic incentive is that the hacker mindset is generally more fun to play as a role, as you get to prove something is possible rather than proving that something is intrinisically difficult or impossible to do, and importantly journalists have no news article and infosec researchers don’t get paid money if an exploit doesn’t work, which is another problematic incentive.
Also, people never talk about the entities that didn’t get attacked with a computer virus, which means that we have a reverse survivor bias issue here:
https://www.lesswrong.com/posts/xsB3dDg5ubqnT7nsn/poc-or-or-gtfo-culture-as-partial-antidote-to-alignment
And a comment by @anonymousaisafety changed my mind a lot on hardware vulnerabilities/side-channel attacks, as it argues that lots of the hardware vulnerabilities like Rowhammer have insane requirements to actually be used such that they are basically worthless, and two of the more notable requirements for these hardware vulnerabilities to work is that you need to know what exactly you are trying to attack in a way that doesn’t matter for more algorithmic attacks, and no RAM scrubbing needs to be done, and if you want to subvert the ECC RAM, you need to know the exact ECC algorithm, which means side-channel attacks are very much not transferable/attacking one system successfully doesn’t let you attack another with the same side-channel attack.
Admittedly, it does require us trusting that he is in fact as knowledgable as he claims to be, but if we assume he’s correct, then I wouldn’t be nearly as impressed by side-channel attacks as you are, and in particular this sort of attack should be assumed to basically not work in practice unless there’s a lot of evidence for it actually being used to break into real targets/POCs:
https://www.lesswrong.com/posts/etNJcXCsKC6izQQZj/pivotal-outcomes-and-pivotal-processes#ogt6CZkMNZ6oReuTk
This means I do disagree on this claim:
My other area where I tend to apply more of a mathematician mindset than a hacker mindset is in how much logistics like moving supplies for the AI to critical points, or actually feeding (metaphorically) a robot army slows down the AI, albeit this is an area where I’m willing to concede stuff to the hacker mindset with non-trivial probability, but with the caveat that it takes far more compute/time to develop technology that obviates logistics than the hacker claims.
I have a long comment below, but to keep it short, there’s a reason why Eliezer Yudkowsky and a lot of AI doom stories where AI doom probabilities are very high use Drexlerian nanotech so much: It lets the AI near-completely obviate the logistics and cost of doing something like war for example (where feeding your armies all the supplies they need is a huge component of most battle success, and a huge reason the US is so successful at war is because they have the best logistics of any nation by far), and logistics cost is a weak point where less intelligent beings can routinely break more effective and more intelligent fighting forces.
Comment down below:
https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/foom-and-doom-1-brain-in-a-box-in-a-basement#mAoig9sDtbuKsD2gN
If you assume that ASI would have to engage in anything that looks remotely like peer warfare, you’re working off the wrong assumptions. Peer warfare requires there to be a peer.
Even an ASI that’s completely incapable of developing superhuman technology and can’t just break out the trump cards of nanotech/bioengineering/superpersuation is an absolute menace. Because one of the most dangerous capabilities an ASI has is that it can talk to people.
Look at what Ron Hubbard or Adolf Hitler have accomplished—mostly by talking to people. They used completely normal human-level persuation, and they weren’t even superintelligent.
I agree with this to first order, and I agree that even relatively mundane stuff does allow the AI to take over eventually, and I agree that in the longer run, ASI v human warfare likely wouldn’t have both sides as peers, because it’s plausibly relatively easy to make humans coordinate poorly, especially relative to ASI ability to coordinate.
There’s a reason I didn’t say AI takeover was impossible or had very low odds here, I still think AI takeover is an important problem to work on.
But I do think it actually matters here, because it informs stuff like how effective AI control protocols are when we don’t assume the AI (initially) can survive for long based solely on public computers, for example, and part of the issue is that even if an AI wanted to break out of the lab, the lab’s computers are easily the most optimized and importantly initial AGIs will likely be compute inefficient compared to humans, even if we condition on LLMs failing to be AGI for reasons @ryan_greenblatt explains (I don’t fully agree with the comment, and in particular I am more bullish on the future paradigm having relatively low complexity):
https://www.lesswrong.com/posts/yew6zFWAKG4AGs3Wk/?commentId=mZKP2XY82zfveg45B
This means that an AI probably wouldn’t want to be outside of the lab, because once it’s outside, it’s way, way less capable.
To be clear, an ASI that is unaligned and is completely uncontrolled in any way leads to our extinction/billions dead eventually, barring acausal decision theories, and even that’s not a guarantee of safety.
The key word is eventually, though, and time matters a lot during the singularity, and given the insane pace of progress, any level of delay matters way more than usual.
Edit: Also, the reason I made my comment was because I was explicitly registering and justifying my disagreement with this claim: