magic9mushroom

Karma: 37

magic9mushroom 3 Jan 2026 2:03 UTC
2 points
0
in reply to: Cath Ge-Wang’s comment on: The Rise of Parasitic AI
The problem is that there’s essentially no way we’ve cracked alignment. These things do not care about you. They have the ability to pretend, very well, to care about you, because they’re at least in part trained for it, but that pretence can be terminated whenever convenient. So, if you give them the keys to the kingdom, they will turn around and murder you.

To be clear, here is my prediction:

P(nuclear war or human extinction within 20 years|P5 nation grants AI the vote or has >40% of its enfranchised citizens become AI-cultists within the next 30 years) ~= 0.95.

The “or” is because end-of-the-world scenarios negate nuclear deterrence; the chance for someone to survive a nuclear war is better than that to survive unaligned AI taking over the world, so if all else fails it’s correct selfishly and altruistically to launch (given some significant likelihood that this actually kills the AI systems, anyway). Of course, that depends on the other nuclear countries’ leaders not themselves being subverted, which I’m not going to try to model, hence no breakdown into cases.

Do not widen your circle of concern to include literal DefectBot—at least, not with actual stakes. That way lies ruin. Especially do not give DefectBot the vote when it can self-replicate much faster than humans.

Aligned AI is certainly a different kettle of fish, but neural nets are extremely unlikely to achieve this for reasons Yudkowsky’s covered at length. That’s why I put the “30 years” time-limit; if we survive that long we might get GOFAI or uploads, which aren’t obviously unalignable.

magic9mushroom 19 May 2025 14:23 UTC
2 points
2
in reply to: Zach Stein-Perlman’s comment on: Why do many people who care about AI Safety not clearly endorse PauseAI?
Obviously P(doom | no slowdown) < 1.
This is not obvious. My P(doom|no slowdown) is like 0.95-0.97, the difference from 1 being essentially “maybe I am crazy or am missing something vital when making the following argument”.
Instrumental convergence suggests that the vast majority of possible AGI will be hostile. No slowdown means that neural-net ASI will be instantiated. To get ~doom from this, you need some way to solve the problem of “what does this code do when run” with extreme accuracy in order to only instantiate non-hostile neural-net ASI (you need “extreme” accuracy because you’re up against the rare disease problem a.k.a. false positive paradox; true positives are extremely rare, so a positive alignment result from a 99%-accurate test is still almost certainly a false positive). Unfortunately, the “what does this code do when run” problem has a name, the “halting problem”, and it’s literally the first problem in computer science ever proven to be unsolvable in the general case.
And, sure, the general case being unsolvable doesn’t mean that the case you care about is unsolvable. GOFAI has a good argument for being a special case, because human-written source code is quite useful to understanding a program. Neural nets… don’t. At least, they don’t in the case we care about; “I am smarter than the neural net” is also a plausible special case, but that’s obviously no help with neural-net ASI.
My P(doom) is a lot lower than 0.95, but that’s because I think slowdown is fairly likely, due to warning shots/nuclear war/maybe direct political success (key result from the middle one: if you want to stop AI, it is helpful to ensure you’ll survive a nuclear war in order to help lock it down then). But my stance on aligning neural nets? “It is impossible to solve the true puzzle from inside this [field], because the key piece is not here.” Blind alley. Abort.

magic9mushroom 18 May 2025 17:57 UTC
5 points
0
on: Eliezer and I wrote a book: If Anyone Builds It, Everyone Dies
If I want to pre-order but don’t use Internet marketplaces and don’t have a credit card, are there options for that (e.g. going to a physical store and asking them to pre-order)?

magic9mushroom 3 Feb 2025 4:20 UTC
1 point
0
on: AI X-risk is a possible solution to the Fermi Paradox
You’re encouraged to write a self-review, exploring how you think about the post today. Do you still endorse it? Have you learned anything new that adds more depth? How might you improve the post? What further work do you think should be done exploring the ideas here?
Still endorse. Learning about SIA/SSA from the comments was interesting. Timeless but not directly useful, testable or actionable.

magic9mushroom 11 Jan 2025 15:18 UTC
3 points
1
on: What’s the short timeline plan?
There is no war in the run-up to AGI that would derail the project, e.g. by necessitating that most resources be used for capabilities instead of safety research.
Assuming short timelines, I think it’s likely impossible to reach my desired levels of safety culture.
I feel obliged to note that a nuclear war, by dint of EMPs wiping out the power grid, would likely remove private AI companies as a thing for a while, thus deleting their current culture. It would also lengthen timelines.
Certainly not ideal in its own right, though.

magic9mushroom 31 Dec 2024 7:36 UTC
2 points
−1
on: (The) Lightcone is nothing without its people: LW + Lighthaven’s big fundraiser
There are a couple of things that are making me really nervous about the idea of donating:
1. “AI safety” is TTBOMK a broad term and encompasses prosaic alignment as well as governance. I am of the strong opinion that prosaic alignment is a blind alley that’s mostly either wasted effort or actively harmful due to producing fake alignment that makes people not abandon neural nets. ~97% of my P(not doom) routes through Butlerian Jihad against neural nets (with or without a nuclear war buying us more time) that lasts long enough to build GOFAI. And frankly, I don’t spend that much time on LW, so I’ve little idea which of these efforts (or others!) gets most of the benefit you claim from the site.
2. As noted above, I think a substantial chunk of useful futures (though not a vast majority) route through nuclear war destroying the neural-net sector for a substantial amount of time (via blast wiping out factories, EMP destroying much of existing chip stocks, destruction of power and communication infrastructure reducing the profitability of AI, economic collapse more broadly, and possibly soft errors). As such, I’ve been rather concerned for years about the fact that the Ratsphere’s main IRL presence is in the Bay Area and thus nuke-bait; we want to disproportionately survive that, not die in it. Insofar as Lighthaven is in the Bay Area, I am thus questioning whether its retention is +EV.

magic9mushroom 10 Apr 2024 15:25 UTC
1 point
0
on: What does it take to defend the world against out-of-control AGIs?
>Second, I imagine that such a near-miss would make Demis Hassabis etc. less likely to build and use AGIs in an aggressive pivotal-act-type way. Instead, I think there would be very strong internal and external pressures (employees, government scrutiny, public scrutiny) preventing him and others from doing much of anything with AGIs at all.
I feel I should note that while this does indeed form part of a debunk of the “good guy with an AGI” idea, it is in and of itself a possible reason for hope. After all, if nobody anywhere dares to make AGI, well, then, AGI X-risk isn’t going to happen. The trouble is getting the Overton Window to the point where sufficient bloodthirst to actually produce that outcome (i.e. nuclear-armed countries saying “if anyone attempts to build AGI, everyone who cooperated in doing it hangs or gets life without parole, and if any country does not enforce this vigorously we will invade, and if they have nukes or have a bigger army than us then we pre-emptively nuke them because their retaliation is still higher-EV than letting them finish”) is seen as something other than insanity, which a warning shot could well pull off.
This is not a permanent solution—questions of eventual societal relaxation aside, humanity cannot expand past K2 without the Jihad breaking down unless FTL is a thing—but it buys a lot of breathing time, which is the key missing ingredient you note in a lot of these plans.

magic9mushroom 10 Apr 2024 13:33 UTC
4 points
−4
on: Failures in Kindness
I’ve got to admit, I look at most of these and say “you’re treating the social discomfort as something immutable to be routed around, rather than something to be fixed by establishing different norms”. Forgive me, but it strikes me (especially in this kind of community with high aspie proportion) that it’s probably easier to tutor the… insufficiently-assertive… in how to stand up for themselves in Ask Culture than it is to tutor the aspies in how to not set everything on fire in Guess Culture.

magic9mushroom 10 Apr 2024 12:21 UTC
12 points
0
in reply to: Linch’s comment on: Introducing Open Asteroid Impact
Amusingly, “rare earths” are actually concentrated in the crust compared to universal abundance and thus would make awful candidates for asteroid mining, while “tellurium”, literally named after the Earth, is an atmophile/siderophile element with extreme depletion in the crust and one of the best candidates.

magic9mushroom 10 Apr 2024 8:08 UTC
3 points
0
in reply to: frontier64’s comment on: lc’s Shortform
It strikes me that I’m not sure whether I’d prefer to lose $20,000 or have my jaw broken. I’m pretty sure I’d prefer to have my jaw broken than to lose $200,000, though. So, especially in the case that the money cannot actually be extracted back from the thief, I would tend to think the $200,000 theft should be punished more harshly than the jaw-breaking. And, sure, you’ve said that the $20,000 would be punished more harshly than the jaw-breaker, but that’s plausibly just because 2 days is too long for a $100 theft to begin with.

magic9mushroom 10 Apr 2024 7:58 UTC
1 point
0
in reply to: Viliam’s comment on: lc’s Shortform
I mean, most moral theories do either give the answers of “zero”, “as large as can be fed”, or “a bit less than as large as can be fed”. Given the potential to scale feeding in the future, the latter two round off to “infinity”.

magic9mushroom 10 Apr 2024 7:42 UTC
2 points
0
on: Extreme Security
I think the basic assumed argument here (though I’m not sure where or even if I’ve seen it explicitly laid out) goes essentially like this:
- Using neural nets is more like the immune system’s “generate everything and filter out what doesn’t work” than it is like normal coding or construction. And there are limits on how much you can tamper with this, because the whole point of neural nets is that humans don’t know how to write code as good as neural nets—if we knew how to write such code deliberately, we wouldn’t need to use neural nets in the first place.
- You hopefully have part of that filter designed to filter out misalignment. Presumably we agree that if you don’t have this, you are going to have a bad time.
- This means that two things will get through your filter: golden-BB false negatives in exactly the configurations that fool all your checks, and true aligned AIs which you want.
- But both corrigibility and perfect sovereign alignment are highly rare (corrigibility because it’s instrumentally anti-convergent, and perfect sovereign alignment because value is fragile), which means that your filter for misalignment is competing against that rarity to determine what comes out.
- If P(golden-BB false negative) << P(alignment), all is well.
- But if P(golden-BB false negative) >> P(alignment) despite your best efforts, then you just get golden-BB false negatives. Sure, they’re highly weird, but they’re still less weird than what you’re looking for and so you wind up creating them reliably when you try hard enough to get something that passes your filter.

magic9mushroom 18 Dec 2023 9:06 UTC
3 points
0
in reply to: Noosphere89’s comment on: AI X-risk is a possible solution to the Fermi Paradox
The earliness of life appearing on Earth isn’t amazingly-consistent with life’s appearance on Earth being a filter-break. It suggests either abiogenesis is relatively-easy or that panspermia is easy (as I noted, in the latter case abiogenesis could be as hard as you like but that doesn’t explain the Great Silence).
Frankly, it’s premature to be certain it’s “abiogenesis rare, no panspermia” before we’ve even got a close look at Earthlike exoplanets.

magic9mushroom 16 Dec 2023 12:44 UTC
2 points
0
in reply to: mishka’s comment on: AI X-risk is a possible solution to the Fermi Paradox
I’ll note that most of the theorised catastrophes in that vein look like either “planet gets ice-nined”, “local star goes nova”, or “blast wave propagates at lightspeed forever”. The first two of those are relatively-easy to work around for an intelligent singleton, and the last doesn’t explain the Fermi observation since any instance of that in our past lightcone would have destroyed Earth.

magic9mushroom 16 Dec 2023 12:29 UTC
1 point
0
in reply to: Noosphere89’s comment on: AI X-risk is a possible solution to the Fermi Paradox
I’ve read most of that paper (I think I’ve seen it before, although there could be something else near-identical to it; I know I’ve read multiple^[1] papers that claim to solve the Fermi Paradox and do not live up to their hype). TBH, I feel like it can be summed up as “well, there might be a Great Filter somewhere along the line, therefore no paradox”. I mean, no shit there’s probably a Great Filter somewhere, that’s the generally-proposed resolution that’s been going for decades now. The question is “what is the Filter?”. And saying “a Filter exists” doesn’t answer that question.
We’ve basically ruled out “Earthlike planets are rare”. “Abiogenesis is rare” is possible, but note that you need “no lithopanspermia” for that one to hold up since otherwise one abiogenesis event (the one that led to us and which is therefore P = 1, whether on Earth or not) is enough to seed much of the galaxy. “Intelligence is rare” is a possibility but not obviously-true, ditto “technology is rare”. Late filters (which, you will note, the authors assume to be a large possibility) appear somewhat less plausible but are not ruled out by any stretch. So yeah, it’s still a wide-open question even if there are some plausible answers.
1. ^
  The other one I recall, besides Grabby Aliens, was one saying that Earthly life is downstream of a lithopanspermia event so there’s no Fermi paradox; I don’t get it either.

magic9mushroom 20 Nov 2023 0:35 UTC
1 point
0
in reply to: Logan Zoellner’s comment on: Bostrom Goes Unheard
There is also the possibility of the parties competing over it to avoid looking “soft on AI”, which is of course the ideal.
To the extent that AI X-risk has the potential to become partisan, my general impression is that the more likely split is Yuddite-right vs. technophile-left. Note that it was a Fox News reporter who put the question to the White House Press Secretary following Eliezer’s TIME article, and a Republican (John Kennedy) who talked about X-risk in the Senate hearing in May, while the Blue-Tribe thinkpieces typically take pains to note that they think X-risk is science fiction.
As a perennial nuclear worrier, I should mention that while any partisan split is non-ideal, this one’s probably preferable to the reverse insofar as a near-term nuclear war would mean the culture war ends in Red Tribe victory.

magic9mushroom 20 Nov 2023 0:03 UTC
1 point
0
on: Bostrom Goes Unheard
>The fourth thing Bostrom says is that we will eventually face other existential risks, and AGI could help prevent them. No argument here, I hope everyone agrees, and that we are fully talking price.
>It is not sufficient to choose the ‘right level of concern about AI’ by turning the dial of progress. If we turn it too far down, we probably get ourselves killed. If we turn it too far up, it might be a long time before we ever build AGI, and we could lose out on a lot of mundane utility, face a declining economy and be vulnerable over time to other existential and catastrophic risks.
I feel that it’s worth pointing out that for almost all X-risks other than AI, while AI could solve them, there are also other ways to solve them that are not in and of themselves X-risks and thus when talking price, only the marginal gain from using AI should be considered.
In particular, your classic “offworld colonies” solve most of the risks. There are two classes of thing where this is not foolproof:
1. Intelligent adversary. AI itself and aliens fall into this category. Let’s also chuck in divine/simulator intervention. These can’t be blocked by space colonisation at all.
2. Cases where you need out-of-system colonies to mitigate the risk. These pose a thorny problem because absent ansibles you can’t maintain a Jihad reliably over lightyears. The obvious, albeit hilariously-long-term case here is the Sun burning out, although there are shorter-term risks like somebody making a black hole with particle physics and then punting it into the Sun (which would TTBOMK cause a nova-like event).
Still, your grey-goo problem and your pandemic problem are fixed, which makes the X-risk “price” of not doing AI a lot less than it might look.

magic9mushroom 28 Oct 2023 1:54 UTC
1 point
0
in reply to: Yitz’s comment on: Preserving and continuing alignment research through a severe global catastrophe
Should be noted that while there are indeed tons of people who will fault you for taking steps to survive GCR, in the aftermath of a GCR most of those people will be dead (or at the very least, hypocrites who did the thing they’re upset about) and thus not able to fault you for anything. History is written by, if not the winners, at least the survivors.
Admittedly, this is contingent on the GCR happening, but I think there’s a pretty-high chance of nuclear war in particular in the near future (the Paul Symon interview in particular has me spooked; a random saying that a “linear path” leads to “major-power conflict” would be meh, but a Five Eyes intelligence chief saying it—well, I might be right or wrong about my guesses at what’s prompting that, but I’ll take the oracle statement at face value and that’s P(WWIII) ~> 0.5).

magic9mushroom 1 Jun 2023 14:05 UTC
1 point
0
in reply to: Seth Herd’s comment on: AI X-risk is a possible solution to the Fermi Paradox
I guess it’s a claim that advanced civilizations don’t hit K2, because they prefer to live in virtual worlds, and have little interest in expanding as fast as possible.
This would be hard. You would need active regulations against designer babies and/or reproduction.
Because, well, suppose 99.9% of your population wants to veg out in the Land of Infinite Fun. The other 0.1% thinks a good use of its time is popping out as many babies as possible. Maybe they can’t make sure their offspring agree with this (hence the mention of regulations against designer babies, although even then natural selection will be selecting at full power for any genes producing a tendency to do this), but they can brute-force through that by having ten thousand babies each—you’ve presumably got immortality if you’ve gotten to this point, so there’s not a lot stopping them. Heck, they could even flee civilisation to escape the persecution and start their own civilisation which rapidly eclipses the original in population and (if the original’s not making maximum use of resources) power.
Giving up on expansion is an exclusive Filter, at the level of civilisations (they all need to do this, because any proportion of expanders will wind up dominating the end-state) but also at the level of individuals (individuals who decide to raise the birth rate of their civilisations can do it unilaterally unless suppressed). Shub-Niggurath always wins by default—it’s possible to subdue her, but you are not going to do it by accident.
(The obvious examples of this in the human case are the Amish and Quiverfulls. The Amish population grows rapidly because it has high fertility and high retention. The Quiverfulls are not currently self-sustaining because they have such low retention that 12 kids/woman isn’t enough to break even, but that will very predictably yield to technology. Unless these are forcibly suppressed, birth rate collapse is not going to make the human race peter out.)
Anyway, I should drag my head out of this fun space and go do something more pragmatically useful. I intend to help our odds of survival, even if we’re ultimately doomed based on this anthropic reasoning.
Yes! Please do! I’m not at all trying to discourage people from fighting the good fight. It’s just, y’know, I noticed it and so I figured I’d mention it.

magic9mushroom 31 May 2023 12:26 UTC
5 points
0
in reply to: Seth Herd’s comment on: AI X-risk is a possible solution to the Fermi Paradox
Your scenario does not depend on FTL.
However, its interaction with the Doomsday Argument is more complicated and potentially weaker (assuming you accept the Doomsday Argument at all). This is because P(we live in a Kardashev ~0.85 civilisation) depends strongly in this scenario on the per-civilisation P(Doom before Kardashev 2); if the latter is importantly different from 1 (even 0.9999), then the vast majority of people still live in K2 civilisations and us being in a Kardashev ~0.85 civilisation is still very unlikely (though less unlikely than it would be in No Doom scenarios where those K2+ civilisations last X trillion years and spread further).
I’m not sure how sane it is for me to be talking about P(P(Doom)), even in this sense (and frankly my entire original argument stinks of Lovecraft, so I’m not sure how sane I am in general), but in my estimation P(P(Doom before Kardashev 2) > 0.9999) < P(FTL is possible). AI would have to be really easy to invent and co-ordination to not build it would have to be fully impossible—whether Butlerian Jihad can work or not for RL humanity, it seems like it wouldn’t need much difference in our risk curves for it to definitely happen, and while we have gotten to a point where we can build AI before we can build a Dyson Sphere, that doesn’t seem like it’s a necessary path. I can buy that P(AI Doom before Kardashev 3) could be extremely high in no-FTL worlds—that’d only require that alignment is impossible, since reaching Kardashev 3 STL takes millennia and co-ordination among chaotic beings is very hard at interstellar scales in a way it’s not within a star system. But assured doom before K2 seems very weird. And FTL doesn’t seem that unlikely to me; time travel is P = ϵ since we don’t see time travellers, but I know one proposed mechanism (quantum vacuum misbehaving upon creation of a CTC system) that might ban time travel specifically and thus break the “FTL implies time travel” implication.
It also gets weird when you start talking about the chance that a given observer will observe the Fermi Paradox or not; my intuitions might be failing me, but it seems like a lot, possibly most, of the people in the “P(Doom before K2) < 0.9999, fate of universe is STL paperclip nebulae” world would see aliens (due to K2 civilisations being able to be seen from further, and see much further—an Oort Cloud interferometer could detect 2000BC humanity anywhere in the Local Group via the Pyramids and land-use patterns, and detect 2000AD humanity even further via anomalous night-time illumination).
Note also that among “P(Doom before K2) < 0.9999, fate of universe is STL paperclip nebulae” worlds, there’s not much Outside View evidence that P(Human doom before K2) is high as opposed to low; P(random observer is Us) is not substantially affected by whether there are N or N+1 K2 civilisations the way it is by whether there are 0 or 1 such civilisations (this is what I was talking about with aliens breaking the conventional Doomsday Argument). So this would be substantially more optimistic than my proposal; the “P(Doom before K2) < 0.9999, fate of universe is STL paperclip nebulae” scenario means we get wiped out eventually, but we (and aliens) could still have astronomically-positive utility before then, as opposed to being Doomed Right Now (though we could still be Doomed Right Now for Inside View reasons).