Yonatan Cale comments on Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

Yonatan Cale 12 Jan 2023 23:37 UTC
2 points
0
My thoughts:
[Epistemic status + impostor syndrome: Just learning, posting my ideas to hear how they are wrong and in hope to interact with others in the community. Don’t learn from my ideas]
A)
Victoria: “I don’t think that the internet has a lot of particularly effective plans to disempower humanity.
I think:
1. Having ready plans on the internet and using them is not part of the normal threat model from an AGI. If that was the problem, we could just filter out those plans from the training set.
2. (The internet does have such ideas. I will briefly mention biosecurity, but I prefer not spreading ideas on how to disempower humanity)
B)
[Victoria:] I think coming up with a plan that gets past the defenses of human society requires thinking differently from humans.
TL;DR: I think some ways to disempower humanity don’t require thinking differently than humans
I’ll split up AI’s attack vectors into 3 buckets:
1. Attacks that humans didn’t even think of (such as what we can do to apes)
2. Attacks that humans did think of but are not defending against (for example, we thought about pandemic risks but we didn’t defended against them so well). Note this does not require thinking about things that humans didn’t think about.
3. Attacks that humans are actively defending against, such as using robots with guns or trading in the stock market or playing go (go probably won’t help taking over the world, but humans are actively working on winning go games, so I put the example here). Having an AI beat us in one of these does require it to be in some important (to me) sense smarter than us, but not all attacks are in this bucket.
C)
[...] requires thinking differently from humans
I think AIs already today think differently than humans in any reasonable way we could mean that. In fact, if we could make an them NOT think differently than humans, my [untrustworthy] opinion is that this would be non-negligible progress towards solving alignment. No?
D)
The intelligence threshold for planning to take over the world isn’t low
First, disclaimers:
(1) I’m not an expert and this isn’t widely reviewed, (2) I’m intentionally being not detailed in order to not spread ideas on how to take over the world, I’m aware this is bad epistemic and I’m sorry for it, it’s the tradeoff I’m picking
So, mainly based on A, I think a person who is 90% as intelligent as Elon Musk in all dimensions would probably be able to destroy humanity, and so (if I’m right), the intelligence threshold is lower than “the world’s smartest human”. Again sorry for the lack of detail. [mods, if this was already too much, feel free to edit/delete my comment]

Yonatan Cale comments on Victoria Krakovna on AGI Ruin, The Sharp Left Turn and Paradigms of AI Alignment

The intelligence threshold for planning to take over the world isn’t low