johnswentworth comments on Orienting Toward Wizard Power

johnswentworth 11 May 2025 16:20 UTC
20 points
5
Glad you liked it!
I’m very skeptical that focusing on wizard power is universally the right strategy. For example, I think that it would be clearly bad for my effect on existential safety for me to redirect a bunch of my time towards learning about the things you described (making vaccines, using CAD software, etc)...
Fair as stated, but I do think you’d have more (positive) effect on existential safety if you focused more narrowly on wizard-power-esque approaches to the safety problem. In particular, outsourcing the bulk of alignment work (or a pivotal act, or...) to AI is a prototypical king-power strategy; it’s just using (king power over AI) in place of (king power over humans). And that strategy has the usual king-power problems—in particular, there’s a very high risk that one’s supposed king-power over the AI ends up being fake. Plus it has new king-power problems from AIs not thinking like humans—e.g. AI probably won’t be governed by dominance instincts to nearly the same degree as humans, so humans’ instincts about e.g. how employer-employee relationships work in practice will not carry over at all.
More wizard-power-esque directions include ambitious interp and agent foundations, but also less obvious things like “make a pivotal act happen without using an AI” (which is a very valuable thing to think through at least as an exercise), or “be bureauracy wizard and make some actually-effective regulations happen”, or whole brain emulation, or genetically engineering smarter humans.
You write “And if one wants a cure for aging, or weekend trips to the moon, or tiny genetically-engineered dragons… then the bottleneck is wizard power, not king power.” I think this is true in a collective sense—these problems require technological advancement—but it is absurd to say that the best way to improve the probability of getting to those things is to try to personally learn all of the scientific fields relevant to making those advancements happen.
It’s not central to this post, but… I’ve read up on aging research a fair bit, and I do actually think that the best way to improve the probability of a cure for aging at this point is to personally learn all of the scientific fields relevant to making it happen. I would say the same (though somewhat lower confidence) about weekend trips to the moon and tiny genetically-engineered dragons.
- Buck 11 May 2025 18:51 UTC
  12 points
  3
  Parent
  I think that if you wanted to contribute maximally to a cure for aging (and let’s ignore the possibility that AI changes the situation), it would probably make sense for you to have a lot of general knowledge. But that’s substantially because you’re personally good at and very motivated by being generally knowledgeable, and you’d end up in a weird niche where little of your contribution comes from actually pushing any of the technical frontiers. Most of the credit for solving aging will probably go to people who either narrowly specialized in a particular domain; much of the rest will go to people who applied their general knowledge to improving the overall strategy or allocation of effort among people who are working on curing aging (while leaving most of the technical contributions to specialists)--this latter strategy crucially relies on management and coordination and not being fully in the weeds everywhere.
- CronoDAS 12 May 2025 11:34 UTC
  6 points
  −6
  Parent
  Most pivotal acts I can easily think of that can be accomplished without magic ASI help amount to “massively hurt human civilization so that it won’t be able to build large data centers for a long time to come.” I don’t know if that’s a failure of imagination, though. (An alternative might be some kind of way to demonstrate that AI existential risk is real in a way that’s as convincing as the use of nuclear weapons at the end of World War II was for making people consider nuclear war an existential risk, so the world gets at least as paranoid about AI as it is about things like genetic engineering of human germlines. I don’t actually know how to do that, though.)
  - johnswentworth 12 May 2025 16:39 UTC
    19 points
    2
    Parent
    Perhaps a more useful prompt for you: suppose something indeed convinces the bulk of the population that AI existential risk is real in a way that’s as convincing as the use of nuclear weapons at the end of World War II. Presumably the government steps in with measures sufficient to constitute a pivotal act. What are those measures? What happens, physically, when some rogue actor tries to build an AGI? What happens, physically, when some rogue actor tries to build an AGI 20 or 40 years in the future when alorithmic efficiency and Moore’s law have lowered the requisite resources dramatically? How do those physical things happen? Who’s involved, what specifically does each of the people involved do, and what ensures that they continue to actually do their job across several decades? What physical infrastructure do they need, where does that infrastructure come from, how much would it cost, what maintenance would it need? What’s the annual budget and headcount for this project?
    And then, once you’ve thought through that, ask: what’s the minimum intervention required to make those same things physically happen when a rogue actor tries to build an AGI?
    - Buck 13 May 2025 16:18 UTC
      10 points
      0
      Parent
      To be clear, I think we at Redwood (and people at spiritually similar places like the AI Futures Project) do think about this kind of question (though I’d quibble about the importance of some of the specific questions you mention here).
  - Thane Ruthenis 12 May 2025 15:36 UTC
    7 points
    2
    Parent
    Some sort of “coordination takeoff” seems not-impossible to me: set up some sort of platform that’s simultaneously massively profitable/addictive/viral and optimizes for e. g. approximating the ground truth.
    Prediction markets were supposed to be that, and some sufficiently clever wrapper on them might yet get there.
    Twitter’s community notes are another case study, where good, sufficiently cynical incentive design leads to unsupervised selection of truth-ish statements.
    This post has been sitting in my head for years. If scaled up, it might produce a sort of white-box “superpersuasion engine” that could then be tuned for raising the sanity waterline.
    Intuitively, I think it’s possible there’s some sort of idea from this reference class that would take off explosively if properly implemented, and then fix our civilization. But I haven’t gone beyond idle thinking regarding it.