testingthewaters comments on testingthewaters’s Shortform

testingthewaters 21 Sep 2025 23:50 UTC
8 points
0
I think I figured out something about why people worried about AI safety go into capabilities. Or rather, this is something I’ve been trying to say for a while, but I finally found a good formulation for it.

Suppose you are Frodo and I am Gandalf. I say, “well, Frodo, the ring is super dangerous. It lies to you and promises you great power, but it will just destroy you and resurrect the dark lord Sauron, ushering Middle Earth into an age of misery and certain doom. You must swear to never put it on, and go on a perilous quest to destroy it.”

And then I add, “Well, this is true for like, 99.9999% of cases. But if you have some true insight, you can actually control the ring. And then it will actually make all your dreams come true. But, like, the vast majority people will fail, and return Sauron to power. So what do you say, Frodo? Better not put on that ring!”

Humans are such a sucker for those kinds of odds.

[Explanation: The problem is that there are things that most people can agree will benefit nobody, like nuclear war. But then there are things that will totally hurt lots of people, but but but there’s this tiny chance that you can in fact do it (and become super powerful/do a huge amount of good) - that’s the possibility of extreme upside that drew Eliezer to start SIAI, which became MIRI. The original purpose of SIAI was to create artificial superintelligence!

Unfortunately he then decided to warn people about AI risk, and basically gave the terrible speech that I gave above (complete with caveats like “we’d be able to solve technical AI safety if we just had one textbook’s worth of knowledge from the future”). He effectively one-shot everyone of a particular nerdish variety into becoming obsessed with the power of the Ring/AI.]

[Final note: I don’t really subscribe to this kind of malicious tool/evil demon conception of AI any more. I think of it more as giving birth to a novel kind of intelligence, with all the possibilities of good and evil that entails. Fwiw I also think that features of human “irrationality” like love, empathy, and care stand a decent chance of being passed on to AIs we create, both through their training data and also because it might not be so irrational after all. I also actually really admire Eliezer’s moral clarity, kindness, and strength of conviction, as well as all the usual things people say about his cleverness and ingenuity. I believe that he honestly conveyed his beliefs to the best of his knowledge and did so with courage and style.]
- FlorianH 22 Sep 2025 20:43 UTC
  1 point
  0
  Parent
  Love your ring analogy, find it quite pertinent (and catching myself as thinking, half-consciously, just a bit as you suggest indeed)
  But re
  [..] gave the terrible speech [..] He effectively one-shot everyone of a particular nerdish variety into becoming obsessed with the power of the Ring/AI,
  to express it using your analogy itself: I’m pretty sure, instead, one way or other, the Ring would by now have found its way to sway us even in the absence of Eliezer.
  - testingthewaters 22 Sep 2025 20:51 UTC
    2 points
    0
    Parent
    I can’t take credit for it, I think I saw it first on this thread, where someone points out that Yudkowsky is one of the few to have passed the ring temptation test. Then I thought, “how would this metaphor actually play out...” and then ended up at my shortform post.
    
    I agree that people would find the idea of AGI alluring and seductive even without Yudkowsky. As I said, I admire him very much for stating his beliefs honestly and with conviction and effectiveness. I find it sad that despite that the ring-temptation is such that even earnest warnings can be flipped into “well, I could be the special one” narratives. But as I said I’m also moving away from the AGI as “worst-possible-godly-alien-superintelligence” framing as a whole.