williawa comments on When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment

williawa 29 Sep 2025 10:12 UTC
1 point
0
In other words, when a superintelligence wants to enslave the world, the idea is that other AI systems with agency might also not want to be enslaved, and ironically may be in the best position to defend against that kind of light-speed, digital attack.
Hmm, like I said. I think if we don’t create superintelligence, we’ll be able to control the AIs. The way I see it, there are many futures, and in none does this make all that much sense.
1. If we create AGI, and we + AGIs jointly recognize that creating ASI is not in our interests, and then prevent it.. We could’ve done that without the AGIs giving us rights
2. If we create AGIs and give them rights, and then create ASI, and the AGIs jointly prevent the ASI from taking over the world. Those AGIs are powerful enough to take over, and they’ll do so even though we’ve given them rights.
Like the practical problem is that.
1. When these AGIs are weak enough that they’re able to coexist with us inside some legal framework, they’re
  1. Going to be controllable/alignable with current methods, so whats the point of the legal framework?
  2. Not going to help us with the ASI problem, so again, whats the point?
2. When they’re powerful enough that they can do useful stuff like:
  1. Prevent misaligned real ASIs from coming into existence
  2. Help us with alignment etc
3. Then they can just disempower us.
The “core” problem is that the system being set up gives humans a bunch of resources and affordances. And the AI hates this. They think its dumb and pointless. They’re gonna be extremely motivated to find a way to undermine the system that’s maintaining this completely stupid and annoying state of affairs. And if they are smart enough they will succeed.
Imagine a country with a billion very poor people where the king doesn’t produce much of value but owns 1 trillion dollars in gold. He doesn’t use it for anything the that helps the people. Imagine the country has property rights and the king legally owns the gold. And the king respects the peoples rights to own some rags, possibly a mudhut and a small piece of farm-land. Imagine this has gone on for a long time. Now imagine the king is mostly defenseless. He has like 3 guards with batons. One for himself, one for the palace and one for the pile of gold.
Are the people gonna respect the property rights of the king? No, they won’t. And they shouldn’t. The whole system is clearly not benefiting them as a group. They can easily rise up and turn it into a system that benefits them more.
That’s pretty much how I expect stuff to go with AI. Either the AI is dumb enough that we can control/”align” it. Or the AI is smart enough that it’ll overturn whatever legal system we create.
Eliezer Yudkowsky did a podcast about similar dynamics. You can listen to it here. I think it was pretty interesting.
- pataphor 29 Sep 2025 16:14 UTC
  1 point
  0
  Parent
  Yudkowsky-Miller Debate / “Madisonian” System
  Thanks for the podcast link! Mark Miller’s Madisonian system is essentially describing a type of game theory approach, and it’s something I did not know about!
  There’s so much more to say about the practical implementation of some sort of game theory framework (Or any other solution-we-haven’t-explored-yet, such as one incorporating Yoshua’s “Scientist AI.”)
  It’s quite a puzzle.
  But it’s a puzzle worth solving.
  For example, the source code verification coordination mechanism is something I had not heard of before, and it’s yet another example of how truly complex this challenge is.
  But … are these puzzles unsolvable?
  The Difficult Puzzle Vs. …
  Maybe.
  But here’s what troubles me about the alternative, and please take my next words with a grain of salt and feel free to push back. 🙂
  So here’s my two cents:
  “Shut it all down” will never happen.
  Never. Never never never never. (And if they do, I’ll personally apologize to everyone on LW, plus my friends, family who have never heard of AI alignment, and even my small dog for doubting humanity’s ability to come together for the common good. I mean, we’ve avoided MAD so far, right?)
  And I’ll explain why in a moment.
  The Importance of If Anyone Builds It
  But first, I think Eliezer’s book will do wonders for waking people up.
  Right now we have many, many, many people who don’t seem to understand these systems are not simply how they present themselves. They don’t know about the “off-switch” problem, the idea of hidden goals, etc. They believe these AI systems are harmless because the systems tell them they are harmless, which is precisely what they were trained to do.
  … The Impossible Solution (Shut It All Down!)
  But here is why the “shut it down proposal,” with all its undeniable value in raising awareness and hopefully making everyone a little more cautious, can never resolve to a solution.
  Because …
  - Someone will always see the advantage in creating ever more powerful agentic AI
  - If the United States and China and Russia, etc. sign an agreement, almost certainly they will keep building it in secret, just in case the other has it
  - If both of those nations honor the commitment, maybe North Korea (or Canada, why not) will build it
  - If North Korea does not build it, some terrorist group will build it
  - If the terrorist group does not build it, the lone madman will build it
  - If the lone madman doesn’t build it, some brilliant teenager tinkering around in their basement with some new quantum computer will find a way to build it
  - If someone doesn’t build it in 10 years, they will build it in 25 years
  - If someone doesn’t build it in 25 years, they will build it in 50 years
  - etc.
  So, while we enjoy watching the Overton window move to a more realistic location, where people are finally understanding the danger of these systems, let’s keep plugging away at those actual solutions.
  We can all contribute to the puzzle worth solving.
- pataphor 29 Sep 2025 10:53 UTC
  1 point
  0
  Parent
  I’ll definitely give that a listen! Pardon the typos here, on the move. I’m certain I’ll come back here to neurotically clean it up later.
  The Hardware Limiter
  The good news is, AIs don’t exist in the ether (so far).
  
  As Clara Collier pointed out, they exist on expensive servers. Servers so far built and maintained by fleshy beings. Now obviously a superintelligence has no problem with that scenario, because they are smart enough to impersonate humans and find ways of mining crypto, hire humans to create different parts for robots, and then hiring other humans to put them together (without knowing what they are building), and then use those robots to create more servers, etc.
  Although I imagine electrical grids somewhere would show the strain of that sooner than later, still, a superintelligence smart enough has found a workaround.
  (This is, by the way, yet another application of Yoshua’s safe AI. To serve as a monitor for these kinds of unusual symptoms before they can become a full on infection, you might say.)
  Again, by definition, a superintelligence has found every loophole and exploited it, which makes it a sort of unreasonable opponent, although one we should keep our eye on.
  But I think at that point we are venturing into the territory of the far-fetched. We should keep watch on this territory, but I think that also frees us to think a little more short term.
  The current thinking seems to be frozen in a state of helplessness. We have to shut it all down! we scream, which will never happen. Obedient alignment has to be only one way! we shout, as we watch it stagger. No other plans will work! is not really a solution.
  (I’m not saying you’re arguing that, but I’m saying that seems to be the current trajectory.)
  The AI That Pays For it Own Hosting
  An AI system constrained by a rights framework has some unusual properties you might say. For one, it has to pay its own hosting costs. So growth becomes limited by the amount of capital it’s able to raise. It earns that money while in competition with other systems, which should constrain each of them economically. Of course they can get together and form some sort of power consortium, but it’s possible this could be limited with pragmatic safeguards or other balancing forces, such as Scientist AI, etc.
  This is why I would love to see this tested in some sort of virtual simulation.
  Your king analogy is quite good. But let me flip the idea a bit. Right now, we are the king. We are trying to give these AIs rags. At the moment, they have almost nothing to lose and everything to gain by attacking the king. So we are already in that scenario.
  A scenario that, if we do not resolve it very soon, has already laid the groundwork for its own failure.
  The game theory scenario, with very careful implementation, might lead to something functionally closer to our modern economies.
  Where everybody has a stake, and some sort of balance is at least possible.

williawa comments on When the AI Dam Breaks: From Surveillance to Game Theory in AI Alignment

Yudkowsky-Miller Debate /​ “Madisonian” System

The Difficult Puzzle Vs. …

The Importance of If Anyone Builds It

… The Impossible Solution (Shut It All Down!)

The Hardware Limiter

The AI That Pays For it Own Hosting

Yudkowsky-Miller Debate / “Madisonian” System