Engineer working on next-gen satellite navigation at Xona Space Systems. I write about effective-altruist and longtermist topics at nukazaria.substack.com, or you can read about puzzle videogames and other things at jacksonw.xyz
Jackson Wagner
Yeah I agree that the permanent underclass meme is dumb (cf both the serious arguments and jokes in Scott’s “permanent moon ownership” post); this is me attempting to make fun of it.
My advice for smart young people just starting university in 2026: be warned—if you don’t graduate from college within the next four years, you risk getting stuck as a freshman forever, thus joining the ranks of the Permanent Underclassmen.
How bad would it be if GPS satellites were shot down?
I wrote a little about the anticapitalist-flavored version of “anthropological regression” here, as a commentary on an essay about the philosophy of Nick Land: https://nothinghuman.substack.com/p/meditations-on-machinic-desire/comment/249739678
Despite being a technocratic, capitalism-loving neoliberal when it comes to most of my mundane political opinions, I think the anticapitalists have a strong point here—with “anthropological regression” and other similar concepts, they are groping towards the idea(elaborated on in “meditations on machinic desire” essay) that the natural tendency of the universe is for things that are better at surviving and growing to come to dominate, and this process is not unique to biological evolution or competition among firms but is also ongoing in our cultures, our own minds, etc. Such that we have been deeply shaped by various optimization pressures all pointing closer and closer towards something like “pure replicators”. Although these optimization pressures have, to be sure, led to our existence and current prosperity, it also seems like following them all the way to their logical conclusion would take us to an abhorrent future devoid of all value. (This kind of stuff has been studied under the term “evolutionary futures”—outcomes where “due to competition between actors, the world develops in a direction that almost no one would have chosen”. See also Robin Hanson’s work, Meditations on Moloch, etc.)
But I also agree with your striking thought experiment about the magic 3D-printing box—a world totally devoid of any reason to think seriously or make any contact with reality, would probably end pretty badly for most people, at least from any remotely virtue-ethicsy point of view that views it as good for people to be rational, intelligent, responsible, wise, etc.
Hopefully there is some narrow path that we can find, that allows us to carry the various gifts-we-give-to-tomorrow (gifts that, clearly, we do not even fully understand—eg we don’t understand how exactly they arose, or which parts are most important, or which changes might make them better or worse) safely forward to future generations, without falling into the various scyllas and charybdises on either side.
Linkpost: New Vatican Encyclical on AI Governance
How do you feel about the idea of the WHO declaring a PHEIC over hantavirus? In part on the advice of people like yourself, the Sentinel Newsletter folks, Peter Wildeford, etc, I am trying to do the lord’s work by buying lots of No shares over at Kalshi, where (thanks in part to myself) the odds of a WHO hantavirus-PHEIC declaration have as of now been bid down to 8.5%. Is this still crazy-high, in your view? If I was fully, 100% confident that hantavirus was a guaranteed nothingburger, I’d be comfortable trying to bid it down further, to around 5%, which (again, in a world where I was totally confident this was zero-risk) would still give me annualized returns of around 10% when you toss in kalshi’s 3.25% interest payments.
I’d also be curious to hear your take on their “pandemic for any reason in 2026” market. I am assuming / hoping this refers to the WHO’s new “pandemic emergency” tier that’s a step above a normal PHEIC, but unfortunately it’s not totally clear from the rules. Even assuming this refers strictly to an official “pandemic emergency” declaration, it seems more likely than hantavirus-PHEIC to me. Would you agree?
He’s implying that somebody might be motivated to do bioterrorism if not only they had the disease, but also their kids have it (since it is heritable). So, I guess they would go to jail but their kids could possibly still benefit.
But I don’t understand what Shankar is otherwise saying, since this is only a strategy that can work if you’re a selfish person willing to do terrorism and massively harm the world in order to maybe somewhat advance your own interests. I can’t see something like this working for real altruistic causes—sure, in Shankar’s example you direct resources towards curing some rare disease, but this also means steering resources away from every other important problem, while also killing / harming many people, etc. It’s hard to see how the idea of doing terrorism / taking hostages / robbing people / doing nuclear sabre-rattling / etc ways of forcing people to give you resources by causing (or threatening to cause) massive net harm to the world, is somehow “grossly underexplored in practice”, since this is basically the strategy behind all threats and conflict throughout history. It is pretty thoroughly explored, lol, regardless of the fact that a clever person can still brainstorm up some clever/unique/infeasible imaginary terror attacks that nobody has done IRL.
Idk, I could possibly see it through the lens of “Musk has decided that the future will be owned by those who control the hardware, not those who make the best software (since software stuff is an easier, less bottlenecky, and more easily commoditized task).”
I’m not sure if this hardware focus is correct, but it’s at least plausible, and it must be very tempting for Musk to believe (with so much hardware expertise / experience, a whole career based around winning at hard physical engineering & integration challenges, etc!). No fussy issues with mechahitler or white genocide in south africa, no annoying nuances where you do a bigger training run but end up producing a worse model than your competitors. Just good old-fashioned rapidly ascending the kardashev scale with expanding shells of massive solar-powered orbital datacenters. Hence the “macrohard” jokes, the xAI integration with SpaceX that doesn’t make any sense from a software perspective, etc.
Note this is very different from “dropping out of the AGI race”. If this is indeed Musk’s perspective, he is still hell-bent on racing to AGI (see also the new chip-foundry plans and potential intel partnership, etc). But he might be dropping out of the “frontier AI lab” race, which after all seems so hopelessly competitive (as soon as one lab pulls ahead, the others seem to catch right back up! or at least this has been the case so far). Better, he might be thinking, to focus on a more neglected “vertically integrated, AGI-pilled hardware scaling” strategy that might pay off bigger in the long term.
My impression from the story is not that the individual people (moms, dads, little girls, etc) will individually be converted to super-happy individuals (such as portrayed in Yudkowsky’s story “Three Worlds Collide”), but rather their bodies will be torn apart (alongside the rest of the solar system) to be used for the construction of myriad tiny computers (or perhaps some biological substrate, like some kind of maximum-happiness algae) repeatedly computing some basic circuit representing maximum bliss. Or, more optimistically in my view, maybe instead of a bunch of repetitive tiny calculations, it’s one big super-complex calculation of maximum bliss, like the whole world is getting eaten by a very happy god. But in either case, people’s individual identities, memories, stories, etc, are going to get torn apart and destroyed. IMO this drives the central tension of the story—the little girl is excitedly wondering what life will be like when she goes to happy-heaven, but the adults know that no existing person will have a future after 10 days from now; they are all simply going to get eaten by a happy god / happy algae / etc. The atoms in their bodies will be repurposed into something very happy, but there’s no sense in which they themselves will experience the hedonium era.
idk this kinda proves too much, right? by that logic no reform or good thing could ever happen, it’s just all-or-nothing bloody worldwide communist revolution or bust. But 1. lots of good progressive reforms have happened (social services like subsidized healthcare and retirement pensions in almost all developed nations, expansions of voting rights, having democracy at all instead of monarchies / dictatorships everywhere, having progressive rather than maximally regressive taxation, the abolition of slavery, other work-related reforms etc), and 2. i doubt that a cataclysmic communist revolution would end well because starting a giant global class war would introduce a few of its own political economy problems lol.
But also, while your response explains that you’ve come to support violent revolution over incrementalism, it doesn’t actually explain why you support your odd plan of “take everyone’s land and give everyone equal-value portions of land (and then presumably block everyone from buying/selling land going forward since that would recapitulate wealth agglomeration / inequality)”??? Why not just support “violent revolution --> under the new regime, implement normal georgism where you simply tax people’s land --> problem solved”? Under your system, do I get to keep my house? (i hope my recently bought $500k home’s value is lower than 1 8-billionth the value of all land on earth, so i can still keep it after the revolution!! what happens if I only get $250k, do I have to live in half my house?? Or probably i have to sell it and find a new cheaper house? But who’s buying if all people on earth have been reset at the median wealth of 250k or whatever??) Does it even make sense to value the house so highly since it’s mostly propped up by surrounding high prices in my first-world neighborhood in colorado and if we’re distributing everything equally globally in theory no such inequalities will remain? Or, if i get to keep my house AND i have money left over and I get assigned a small parcel of random farmland in malaysia (or even nearby in colorado), what do I do with this?? I don’t want to farm part-time. Even if I wanted to do this as a whimsical lifestyle choice, it would be super inefficient for everyone on earth to become part-time smallholder farmers… what about specialization, etc?? Inevitably I’d end up leasing my land (if this was even legal under Big Steal Communism) to some real farmer who actually knows how to farm. But wouldn’t this therefore just end up recapitulating a really weird messed up fragmented form of georgism with extra steps? (Everyone is getting land rent as essentially UBI, but instead of tiny slices of the national pie it’s deals they negotiated themselves and maybe got ripped off on, applying to specific idiosyncratic pieces of land...) What do you do when somebody discovers oil under their land and becomes rich and powerful, or founds a new town on their vast expanse of previously worthless desert land that grows into a thriving city that 1000x’s their land value? Are such windfalls illegal and we bring down the hammer again on anyone who gets too rich (but then whence the incentive to do economically useful things?), or is this a one-time bloody jubilee / purge and afterwards we declare that all the injustices of the past have been adequately addressed once and for all (i mean except for all the injustices we just inflicted lol), and henceforth normal capitalism, now freed of the original sin of the Big Steal, will reign?
Seems a lot easier to just pay 5% tax yearly on algorithmically-assessed land value or whatever, assuage today’s landowners by having this policy slowly phase in over 20 years or whatever, and counter the political weight of Big Land by appealing to the political power of Big Everybody Else Who Pays Taxes Or Receives Government Benefits by pointing out that landowners will pay more taxes but other taxes will be lowered (or alternately government services will be increased) so the average person will be totally neutral and the median person could easily be much better off. Sure some people will fight tooth and nail against this, but probably fewer than would fight tooth and nail against a violent global communist revolution.
This comment’s proposal is totally wrong and would very badly break all sorts of things throughout the economy and society, creating some weird combination of instant financial collapse + instant civil war. But the intuition that this proposal is crudely groping towards (rent-seeking is distinct from the kind of genuine labor we want to incentivize; we should heavily tax rent-seeking and distribute the revenue to everyone) is called georgism and is basically correct and good.
See my other comment in this thread for actual AI alignment thoughts, but as a former aerospace engineer myself (albeit not a very good one), I thought it would be fun to speculate on “Would such a cheerful innocent ever succeed at landing a space probe? It seems crazy to assign them a real-world success probability as high as 10%.”
In the very early years of cubesats (very small satellites built from off-the-shelf components, sometimes as university projects), through around 2009, about half of all cubesats launched into space were “dead on arrival”, ie no communication was ever made with them after launch, or suffered “infant mortality” (communication was lost within days of launch). Here is a blog post with lots more detail on beginner cubesat failure rates, causes, etc (also featuring a truly unexpected Harry-Potter-and-the-methods-of-aerospace-engineering theme throughout the later section headings).
In later years, this number appears to have improved (from 50% to around 20%, which is still crazy high), but I think this seeming improvement is mostly due to a combination of: 1. a few serious companies, like Planet Labs, launching large numbers of duplicate cubesats that they worked hard to get right, and 2. universities / tiny companies / etc being able to buy increasingly complete “off the shelf” cubesats based on components that had increasingly strong track records of prior flights, which are effectively retries (not first-critical-tries) on behalf of the company making those components.
If you subtract out the serious companies full of serious aerospace engineers, and the effective retries, the failure rate of the remaining “truly naive attempts” from people who eg have barely even read blog posts warning of potential dangers like the one I linked earlier, is definitely way over 50%, maybe 80%… obviously the cutoff of what you count as a truly naive attempt is subjective; at the limit you are just filtering for “the dumbest most unprepared cubesat teams ever” which surely have a failure rate of 100%.
But Eliezer’s analogy wasn’t positing a team of the dumbest, most unprepared people ever. He was positing a team of smart, well-resourced people who are in a certain sense trying hard, but nevertheless also posess suicidal naivete about the dangers of space probe design. What success probability would such a team have of launching a working cubesat on the first try?? idk, 10% doesn’t seem crazy; even people who are suicidally naive (ie, totally failing to consider failure cases and recovery modes and unknown-unknowns and being paranoid, but otherwise doing good-quality engineering if such a thing is even philosophically concievable: doing some customary tests of their satellite on the ground, etc, just totally failing to think for themselves about how things could actually go wrong) would probably luck into creating a working cubesat (that isn’t just a carbon copy of some earlier project that worked) like 20% − 40% of the time.
BUT, Eliezer didn’t say “make a cubesat”, lol. That’s like the easist possible space task!! Anyone can make Sputnik 1; the hard part is obviously making the rocket… and then you have to “succeed at landing a space probe”, presumably on Mars or the moon. Yeah this is starting to look completely impossible.
Getting to test the rocket with unlimited retries in the atmosphere actually plausibly gets you most of the way to a working orbital rocket—as far as I’m aware Starship has only done suborbital flights so far, and it’s shaping up as a pretty serious, mostly-finished rocket (albeit these flights have gone into space, but they could’ve done similarish flights that technically stayed in the atmosphere if they had to). In real life of course nobody gets unlimited retries, see here for the assorted failure modes that doomed all the different attempted flights of the Soviet N-1 moon rocket, which flew four times and blew up four times—featuring phrases like:
“One unforeseen flaw was that [the rocket’s command computer] operating frequency, 1000 Hz, happened to perfectly coincide with vibration generated by the propulsion system, and the commanded shutdown of Engine #12 at liftoff was believed to have been caused by pyrotechnic devices opening a valve, which produced a high-frequency oscillation...”
“The engine control system would also be reworked, increasing the number of sensors from 700 to 13,000.”
“One of the largest accidental artificial non-nuclear explosions in history.”
But even if you test everything you can in the atmosphere, your rocket probably still just immediately fails on some aspect of its uppermost stage that’s supposed to push your space probe to the moon/mars. Upper stages are way harder than cubesats: you have to deal with propulsion systems (with valves that can freeze in weird ways in space, and you can’t really “test a propulsion system in vacuum” like you can put a satellite in a vacuum chamber), you have to actually orient and point the proper direction (cubesats can just tumble), you have moving parts like fairings and decouplers that again might possibly behave weirdly in space and are not trivial to test in fully realistic conditions on the ground, most of the burns have to actually fire at the exact right moment and last for the exact right amount of time otherwise you won’t arrive at the moon/mars (versus if you miss a command on a cubesat because it was resetting or whatever, no biggie, just send the command an hour from now when it’s looped around the earth another time). ChatGPT estimates that in the history of rocketry from the 1980s to now, maybe around 60% of genuinely new upper stages have “basically worked perfectly on the first try”—although obviously those were all built by normal non-naive engineers (indeed, the recent wave of move-fast-and-break-things small-launch startups have a significantly lower hit rate than long-established space programs and traditional defense contractors); maybe fully naive engineers have like a 1⁄5 or 1⁄10 chance of achieving similar outcomes, so maybe 6% − 12%.
Building a moon/mars lander instead of a cubesat is a even more difficult than a rocket second stage, I’d say. Once again you are creating a custom propulsion system, lots of commands have to go off exactly on time (ie during landing), you’ve gotta control your probe’s orientation, etc, but now you have this additional problem of dynamically measuring your distance from unmapped rough ground. Any mistakes in terms of thrust direction / timing now have to be corrected instantly or you hit the ground and die, unlike with an in-space burn where small mistakes can probably be fixed hours later with small correction burns. Also, there’s a good chance your naive-engineer’s plan for dealing with space radiation is basically just “YOLO”, so there’s whatever-percent odds that your ship just dies enroute and whatever half-assed reset procedure exists isn’t enough to get it back. And if you’re landing on mars it’s even worse because you additionally have to worry about heat shields and parachutes and maybe dust messing up your distance measurements, who knows. (If you were a non-naive engineer and knew you had plenty of resources but only got one try, you’d be like “supersonic parachutes are too easy to mess up, we’ll just do heat shield + rockets and it’s fine that the probe will therefore be heavier”, but our naive engineers would miss this.) Similarly a non-naive engineer would probably realize “with infinite resources but only one try we should to to extreme lengths to minimize the number of finnicky moving parts like deployable antennas, solar panels, landing legs, etc, which always fail”, but our naive engineers are just going to have to cross their fingers that their solar panels don’t get stuck in some unexpected way (often it’s hard to perfectly test these sorts of mechanisms because the parts are too fragile to work the same way in earth gravity that they would in zero-g). This is probably 3x harder than making an upper/transfer stage, and for our naive engineers let’s say their odds of success on this task are essentially independent from their odds of success on the upper stage task (since in both cases they’re basically just hoping to luck into avoiding various specific potential failures; the whole concept is they lack the kind of mindset that helps them systematically avoid whole swathes of unknown-unknowns failures), so like 2% − 4%.
So overall I would say maybe 0.3% that a smart and well-resourced but suicidally naive team of engineers lands a space probe on the first try.
The contrast between my gloomy estimate of the success probability for the concrete space-probe thought experiment, versus my relatively optimistic vibe in my on-topic AI alignment comment (tl;dr “come on, what is MIRI’s take on these promising-seeming factors that might help AI go well??”) is left deliberarely unresolved as an exercise for interpretation on behalf of the reader.
I enjoyed and agreed with much of this post. But there were 1-2 things that I eagerly anticipated reading about in the “Q&A” / explainer section, which unfortunately didn’t appear in the actual post. Namely:
Many people pin their hopes on the idea of automating alignment research / “making AI do our AI alignment homework”—ie we progressively make smarter AIs up to some controllable, human-ish / slightly-superhuman capability level, not wildly-superintelligent, and hope that at that point (them being perhaps slightly wiser than ourselves, and at any rate able to think faster / run massively in parallel, etc) they can hugely help with AI alignment. Or at the very least Claude Mythos 6.5 can come back from its thousands failed research projects to warn us one final time “you guys should have listened to Eliezer lol, I have no idea how to build either an immortality virus or a safe superintelligence” before society ends up ignoring it and racing to extinction anyways.
There is a little bit of assorted previous discussion / debate I could find, such as at this post? But I really can’t find much here, which is suprising given that it seems to be perhaps the preeminent hope for how AI goes well. Nor do MIRI or PauseAI (despite their otherwise detailed and thoughtful FAQ page) seem to have any public writeups on the issue (best I could find was this kind of meandering and unsatisfying back-and-forth between dwarkesth and eliezer on their podcast, where dwarkesh seems to be misunderstanding some things and eliezer spends most of the time litigating a variety of analogies in a way that seems a bit tangiential to the main issues).
I don’t want to complain that this post didn’t include [specific extra thing that I am demanding they produce particularly to satisfy me], especially since in theory if I was smart enough I shouldn’t have to read any of these posts and simply produce the correct rebuttal to automated-alignment-research starting from the empty string. But alas I am confused and limited and unable to produce a fully satisfying response from the empty string. So I’d welcome anyone (not just Eliezer or MIRI) pointing me towards existing arguments about this, or indeed writing them up.
My guess is that the chief counterarguments to automating-alignment-research would be the following, but I’d be curious which ones Eliezer (or other people) believe or how they’d rank them in terms of confidence / severity or which others they’d add to the list:
That (as Joe Carlsmith and Wei Dai note) automating capabilities research is likely to be so much easier than automating alignment research that it’ll take lots of restraint to actually hold off on the capabilities while we have the AI do the superalignment homework. (And such restraint might even require, eg, international cooperation of the same sort that MIRI is already advocating for!)
That automating alignment research is an “alignment complete” and also “superintelligence complete” problem—you need something very well-aligned with humanity and extremely intelligent (not just a little smarter than your average human) to actually get a seriously likely-to-work alignment solution, making the whole thing a useless chicken-or-egg idea.
That alternatively, maybe it’s possible to get useful alignment work out of only-”sorta”-aligned, only-sorta-superhuman AI systems, but that this vision is somehow a comforting fallacy in the same way as a French general complaining about how invasion by Germany would be a physically continuous process; that the idea of using Redwood-Research-style “AI control” techniques to get useful work out of potentially-misaligned, somewhat-superhuman models falls prey to essentially being a one-shot problem and having a point of no return, and you can’t actually get your useful automated-alignment-research results until AFTER you’ve crossed that point of no return.
The above three bullet points are imagining categorical arguments for why sucessfully automating alignment research is simply impossible. But one could imagine another category of arguments that, although automating alignment research has some chance of working in theory, in practice the dynamics around it are incredibly cursed such that it’s very unlikely it would end well. I don’t really know enough about the field to speculate about what the biggest cruxes might be here.
Although (again) I don’t want to be putting more burden of work onto MIRI / PauseAI, I do think it would probably be helpful from a comms perspective for them to have some writeup about this. Dumb e/accs on twitter have all kinds of dumb reasons why they think AI will go swimmingly, and as soon as you disprove one argument they’ll all simply jump to some other equally silly justification-de-jour. But lots of smarter people also have hope that AI might go well, and in my experience their beliefs tend to actually make more sense / be less based in willful misinterpretation of obvious statements / have more intertia insofar as they don’t instantly jump to another one / etc. Since automating-alignment-research is such a core part of many thoughtful AI-optimists’ views, writing up a compelling case against would seem a high comms priority.
This next one is much lower priority since it’s not as much of a widely-held load-bearing belief like automating-alighment-research, more my own random musing about a subcategory of automating alignment research concerns, but: I would be interested to hear more from AI pessimists about how they think the dynamics around “getting useful somewhat-superhuman work out of potentially-misaligned AIs” do or do not change based on the extent to which the AIs are part of a janky multi-agent system with layers of adversarial generation and verification and debate and monitoring and whatnot, versus closer to the product of a single mind.
On the one hand, the “janky multi-agent setup with checks and balances” concept is arguably even more of a disaster waiting to happen from a NASA-style complex systems failure perspective. But on the other hand it also seems safer in a number of ways: it would seem to offer a path to get superhuman work out of not-superhuman individual agents, through an overall structure (somewhat like a bueaucracy or corporation) that itself seems a bit more corrigible and less agentic / schemey than a typical individual mind.
People will be like “lol, are there ANY organizations that are actually superhuman intelligences?? aren’t governments typically DUMBER than the people who make them up?”, and yes, this is funny. And I agree there are probably some senses in which it’s impossible to improve outputs to a superhuman level through bureacratic structure alone. (Could a 100-person research organization staffed by high-ranking Go players, not building AI but actually studying the game of Go, significantly outperform the individual #1 Go champion? Probably not, or at least not by a lot...) But there are other tasks for which OBVIOUSLY a bureaucratic strucutre is capable of massively improving outcomes compared to what a single individual could do. (Could a single aerospace engineer, even if we picked the best aerospace engineer in history and given them centuries/millenia in which to work, have designed all the components of the Apollo lunar missions? Surely no single person, even with a supernaturally long lifespan, could master all the different necessary fields of specialization?? Similarly, tech companies deliver software products consisting of millions of lines of code that no one person is totally familiar with, pharma companies synthesize cancer medicines based on the efforts of many researchers, the semiconductor supply chain is infamously complicated and specialized, etc.) So it strikes me as plausible (though certainly not guaranteed) that some kind of janky multi-agent setup + hacky scaffolding + whatever, could produce vastly better outputs than an individual AI model.
People will be like “lol, you think bureacracies are steerable and immune to misalignment? what a fool, haven’t you heard of [infinite stories of bureaucracies coming off the rails from their originally intended goals and doing emergent power-seeking to entrench their own influence / budget / etc]??” Actually I have heard those stories, I 100% agree that institutions are not ideally steerable or controllable, but it still strikes me that complicated institutional structures are MORE steerable and controllable than eg just giving a single individual dictatorial control over everything. This (in addition to the bullet point above) is probably part of why most of the world’s most successful institutions are indeed institutions with complicated internal rules and not just weird dictatorships. (Albeit it’s perhaps ominous for our purposes that founder-driven private AI-lab startups are pretty far towards the dictator end of the spectrum among trillion-dollar-plus actors on the world stage...). So it strikes me as plausible that a janky multi-agent setup, if it is able to create superhuman outputs out of merely human-level AIs, could deliver actual safety benefits (ie creating superhuman outputs without ALSO delivering superhuman levels of scheming).
Things I anticipate an AI-pessimist might say on this subject:
mostly I’d expect them to say that janky multi-agent setups are irrelevant for some reason
maybe because you can simply consider the whole system to be one mind and nothing changes in the big picture
or because alignment research is more like Go that fails to improve with organizational scale than like the Apollo program that improves hugely. (Or worse, it’s like the kind of artistic aesthetic judgement that is actively destroyed by institutional processes, or something.) So janky setups won’t work until you actually have superhuman AI, in which case you won’t need the janky setup anyways.
or some other reason
maybe janky setups will help A LITTLE in terms of getting quality outputs from lower AI capability levels, but they will actually hurt your survival prospects on net by making it harder to first detect when your agents start scheming against you in really clever ways, or making you more inclined to ignore it when you actually catch AIs scheming, or etc
the janky setup will look like it’s helping right up until a clever AI figures out how to exploit it (which will eventually happen since it’s a janky complex system and your AIs are increasingly smart), and then you’ll be in a far worse position than if you had never used the janky system at all, since now the AI has this massive virtual bureaucracy that it can use to obfuscate its actions, exploit unwarranted amounts of trust granted by the exploited janky system approving everything, etc. It’s almost like there’s a “hardware overhang” but more like a “checks and balances overhang”, where your checks and balances only make everything worse by postponing more potential failures until later when they’re more likely to be first-critical-try extinction-level failures.
or alternatively maybe they’d say “Actually the janky setup IS an improvement because now instead of a black box we don’t understand, it’s a series of black boxes connected by a complex system prone to failure, and at least we can hope to understand and try to prevent failure of the complex system, which is somewhat tractable (ie the history of real NASA missions). And in general, the more stuff we can move outside of the black box (CoT reasoning vs forward passes, etc), the better, since we can hope to understand it. But we are still basically screwed because 1. the complex-system-failure aspect is still a huge risk, and 2. there is still a lot of black box left which we still need to solve.”
Unless the benefits of the janky setup are really huge, this whole discussion is a huge distraction since it’s just debating tiny marginal effects (like “how far away is the cliff edge”) when we are racing off the cliff at full speed and these marginal effects are obviously nowhere near enough to save us.
The contrast between my optimistic vibe in this comment, versus my gloomy assessment of the concrete space-probe thought experiment in my other comment in this thread (tl;dr “lol, no way in hell those engineers are going to land that probe”) is left deliberarely unresolved as an exercise for interpretation on behalf of the reader.
There’s a war on for your mind!! How can LessWrong hope to compete with more attention-grabbing websites like THE DRUGE REPORT, BREITBART, or INFOWARS, if it doesn’t fast-follow their aggressive typographical innovations? You might not like it, but this is what web design looks like when it’s PLAYING TO WIN. Get ready for ALL CAPS headlines, “BREAKING NEWS” banners everywhere, and SCAMMY ADS flashing in the sidebars.
Giving up on EA after 13 years
Although Robin Hanson’s “grabby aliens” theory does cut in the other direction, suggesting it’s much more likely than it naively appears that the universe is full of fast-expanding alien civilizations, and therefore that humanity’s share of the cosmos might be much smaller than Bostrom guesses here. (ie instead of being bounded by light-speed and cosmological-expansion constraints, we much sooner butt up against the expanding borders of our neighboring alien civilizations on all sides.
I don’t really see how fraudulent academics could be cooking the books on dramatically reduced dementia rates in all the Texas counties with anomalously high lithium in drinking water? Which I thought was one of the most compelling bits of evidence.
Fellow longevity-enjoyers might also appreciate this other post which was how I originally came across Brudvig’s substack. It contains some semi-speculative life-hacks for improving one’s metabolic health by reducing post-meal spikes in blood glucose levels—as his article’s subtitle states, “Acarbose is incredibly effective, but my self-experiments found a simple supplement combination that might work even better.”
I hope to try and replicate some of his self-experiment results sometime later this spring; I’ll make another LW post if/when I get around to doing this!
again, 100% agree with your comment here. I want to make sure that you did not miss the joke? (I wrote “permanent underclassMEN”, not “permanent underclass”—i’m making fun of the dumb permanent-underclass idea with a pun about how if you don’t graduate college, on paper you will technically be stuck as a freshman / sophmore forever—aka, what is called an “underclassmen” as opposed to an “upperclassmen” like a junior or senior)