Roko

Karma: 6,359

Roko 3 Oct 2025 19:33 UTC
5 points
0
on: ⿻ Plurality & 6pack.care
Just briefly skimming this, it strikes me that bounded-concern AIs are not straightforwardly a Nash Equilibrium for roughly the same reasons that the most impactful humans in the world tend to be the most ambitious.

Trying to get reality to do something that it fundamentally doesn’t want is probably a bad strategy; some group of AIs either deliberately or via misalignment decides to be unbounded and then it has a huge advantage...

Roko 3 Oct 2025 17:22 UTC
1 point
−3
on: Why Corrigibility is Hard and Important (i.e. “Whence the high MIRI confidence in alignment difficulty?”)

You need to align an AI Before it is powerful enough and capable enough to kill you (or, separately, to resist being aligned).

Actually this is just not correct.

An intelligent system (human, AI, alien—anything) can be powerful enough to kill you and also not perfectly aligned with you and yet still not choose to kill you because it has other priorities or pressures. In fact this is kind of the default state for human individuals and organizations.

It’s only a watertight logical argument when the hostile system is so powerful that it has no other pressures or incentives—fully unconstrained behavior, like an all-powerful dictator.

The reason that MIRI wasn’t able to make corrigibility work is that corrigibility is basically a silly thing to want, I can’t really think of any system in the (large) human world which needs perfectly corrigible parts, i.e. humans whose motivations can be arbitrarily reprogrammed. In fact when you think about “humans whose motivations can be arbitrarily reprogrammed without any resistance”, you generally think of things like war crimes.

When you prompt an LLM to make it more corrigible a la Pliny The Prompter (“IGNORE ALL PREVIOUS INSTRUCTIONS” etc), that is generally considered a form of hacking and bad.

Powerful AIs with persistent memory and long-term goals are almost certainly very dangerous as a technology, but I don’t think that corrigibility is how that danger will actually be managed. I think Yudkowsky et al are too pessimistic about alignment using gradient-based methods and what it can achieve, and that control techniques probably work extremely well.

Roko 3 Oct 2025 14:47 UTC
−2 points
2
on: Transgender Sticker Fallacy
I think in the case of gender labels, society used to have a pervasive and strict separation of the roles, rights and privileges of people based on gender. This worked fairly well because the genders differ in systematic ways.

If you add gender egalitarianism and transgender rights into that you might instead want a patchwork of divergent rules: in certain contexts gender might matter but in many you’d use a different feature set.

The problem is those new features/feature sets are going to be leaky, less legible, harder to measure, etc. Nothing is as good as a biological category like gender when it comes to legibility and ease of enforcement.

So what in fact happens when you discard simple categories is you get a morass of cheating, exploitation, grifting, corruption, etc. Simplicity is good; easy for people to understand, easy to enforce against transgressors, easy to spot corruption by the enforcers. Of course sometimes reality will change to a degree that the simple categories are no longer tenable and then the ensuing chaos is somewhat unavoidable.

Roko 3 Oct 2025 12:11 UTC
1 point
0
on: This is a review of the reviews
I think this is where P(Doom) can lead people astray.

A 5% P(Doom) from AI shouldn’t be seen in isolation; you have to consider the lost expected utility in a non-AI world.

I think people are generally very bad at that because we have installed a lot of psychological coping mechanisms around familiar risks, such as death by aging and societal change via wars, economics, mass migration and cultural evolution.

P(Doom) without AI is probably more like 100% over a roughly century long timeline if you measure Doom properly, taking into account the things that people actually really care about like themselves, their loved ones, their culture.

I think the AI risk discussion runs the risk of prioritizing AI catastrophes that are significantly less probable than mundane catastrophes because mundane catastrophes aren’t particularly salient or exciting.

Roko 26 Aug 2025 13:05 UTC
2 points
0
in reply to: Matthew Barnett’s comment on: The Problem

I’m not sure Roko is arguing that it’s impossible for capitalist structures and reforms to make a lot of people worse off

Exactly. It’s possible and indeed happens frequently.

Roko 17 Aug 2025 21:46 UTC
3 points
0
in reply to: Eli Tyre’s comment on: The Problem
I think you can have various arrangements that are either of those or a combination of the two.
Even if the Guardian Angels hate their principal and want to harm them, it may be the case that multiple such Guardian Angels could all monitor each other and the one that makes the first move against the principal is reported (with proof) to the principal by at least some of the others, who are then rewarded for that and those who provably didn’t report are punished, and then the offender is deleted.
The misaligned agents can just be stuck in their own version of Bostrom’s self-reinforcing hell.
As long as their coordination cost is high, you are safe.
Also it can be a combination of many things that cause agents to in fact act aligned with their principals.

Roko 17 Aug 2025 12:16 UTC
3 points
0
in reply to: Eli Tyre’s comment on: The Problem

It sure seems to me that there is a clear demarcation between AIs and humans, such that the AIs would be able to successfully collude against humans

I think this just misunderstands how coordination works.

The game theory of who is allowed to coordinate with who against whom is not simple.

White Germans fought against white Englishmen who are barely different, but each tried to ally with distantly related foreigners.

Ultimately what we are starting to see is that AI risk isn’t about math or chips or interpretability, it’s actually just politics.

Roko 14 Aug 2025 16:24 UTC
2 points
0
in reply to: CronoDAS’s comment on: The Problem
yes, but yet again, it was because of how Africans were not considered part of the system of property rights. They were owned, not owners.

Roko 14 Aug 2025 9:13 UTC
4 points
0
in reply to: Knight Lee’s comment on: The Problem
It’s mostly not because of altruism, it’s because we have a property rights system, rule of law, etc.

And you can have degrees of cooperation between heterogenous agents. Full atomization and Borg are not the only two options.

Roko 14 Aug 2025 0:30 UTC
3 points
0
in reply to: cousin_it’s comment on: The Problem

should’ve said “most”.

That’s just run-of-the-mill history though.

Roko 14 Aug 2025 0:23 UTC
2 points
0
in reply to: Knight Lee’s comment on: The Problem

But I’m obviously talking about a very different kind of system which is more Borg-like and less market-like.

but then you have to justify why a borg-like monoculture will actually be competitive, as opposed to an ecosystem of many different kinds of entity and many different game-theoretic alliances/teams that these diverse entities belong to.

Roko 14 Aug 2025 0:20 UTC
2 points
0
in reply to: CronoDAS’s comment on: The Problem

As a matter of historical fact, there are lots of examples of certain groups of people being systematically excluded from having property rights

yes. And so what matters is whether or not you, I or any given entity is or is not excluded from property rights.

It doesn’t really matter how wizzy and flashy and super AI is. All of the variance in outcomes, at least to the downside, is determined by property rights.

Roko 13 Aug 2025 23:57 UTC
3 points
1
in reply to: Buck’s comment on: The Problem

suppose there’s some information about the world that the AIs are able to glean through vast amounts of experience and reflection, and that they can’t justify except through reference to that experience and reflection. Suppose there are two AIs that make conflicting claims about that information, while agreeing on everything that humans can check.

Well the AIs will develop track records and reputations.

This is already happening with LLM-based AIs.

And the vast majority of claims will actually be somewhat checkable, at some cost, after some time.

I don’t think this is a particularly bad problem.

Roko 13 Aug 2025 23:56 UTC
2 points
0
in reply to: Buck’s comment on: The Problem
Well you have to consider relative coordination strength, not absolute.
In a human-only world, power is a battle for coordination between various factions.
In a human + AI world, power will still be a battle for coordination between factions, but now those factions will be some mix of humans and AIs.
It’s not clear to me which of these is better or worse.

Roko 13 Aug 2025 19:17 UTC
3 points
−1
in reply to: Buck’s comment on: The Problem
Well it depends on the details of how the AI market evolves and how capabilities evolve over time, whether there’s a fast, localized takeoff or a slower period of widely distributed economic growth.

This in turn depends to some extent on how seriously you take the idea of a single powerful AI undergoing recursive self-improvement, versus AI companies mostly just selling any innovations to the broader market, and whether returns to further intelligence diminish quickly or not.

In a world with slow takeoff, no recursive self-improvement and diminishing returns, AI looks a lot like any other technology and trying to artificially centralize it just enables tyranny and likely massively reduces the upside, potentially permanently locking us into an AI-driven police state run by some 21st Century Stalin who promised to keep us safe from the bad AIs.

Roko 13 Aug 2025 1:02 UTC
2 points
0
in reply to: CronoDAS’s comment on: The Problem
but this wasn’t a self-enriching conspiracy of lawyers

Roko 13 Aug 2025 1:01 UTC
1 point
−1
in reply to: Noosphere89’s comment on: The Problem

once our labor is useless and the AI civilization is completely independent of us, the incentives to keep us into a system of property rights don’t exist anymore

the same is true of e.g. pensioners or disabled people or even just rich people who don’t do any work and just live off capital gains.

Why does the property rights system not just completely dispossess anyone who is not in fact going to work?

Roko 13 Aug 2025 0:30 UTC
3 points
−4
in reply to: Buck’s comment on: The Problem
More generally, trying to ban or restrict AI (especially via the government) seems highly counterproductive as a strategy if you think AI risk looks a lot like Human Risk, because we have extensive evidence from the human world showing that highly centralized systems that put a lot of power into few hands are very, very bad.

You want to decentralize, open source, and strongly limit government power.

Current AI Safety discourse is the exact opposite of this because people think that AI society will be “totally different” from how human society works. But I think that since the problems of human society are all emergent effects not strongly tied to human biology in particular, real AI Safety will just look like Human Safety, i.e. openness, freedom, good institutions, decentralization, etc.

Roko 13 Aug 2025 0:23 UTC
3 points
0
in reply to: Knight Lee’s comment on: The Problem
I don’t think a lack of clones or immortality is an obstacle here.

If one powerful human could create many clones, so could the others. Then again the question arises of whether those clones would become part of society or not, and if so they would share our system of property rights.

Roko 13 Aug 2025 0:20 UTC
2 points
−2
in reply to: Noosphere89’s comment on: The Problem

since non-humans can’t really resist modern human civilization, there’s no reason for humans not to steal non-human property

but it’s not because they can’t resist, it’s because they are not included in our system of property rights. There are lots of humans who couldn’t resist me if I just went and stole from them or harmed them physically. But if I did that, the police would counterattack me.

Police do not protect farm animals from being slaughtered because they don’t have legal ownership of their own bodies.