I feel like this timeline runs into something that bothers me about a lot of space colonization scifi: why are there millions of people living in space colonies when in our timeline, there aren’t millions of people living in Antarctic colonies? Antarctica is a lot more habitable and resource rich than the moon or Mars, building cities there would require a pretty small fraction of the cost of doing so in space, and the standard of living for residents would be a lot higher. The same seems to be true of cities on the ocean floor and underground. Shouldn’t we expect to see a lot of colonization like that first, before moving to space becomes economically sensible?
Granted, there is the old idea of space colonies as a sort of “backup civilization” in case something happens to civilization on earth, which your setting touches on- but I think that probably has the same issue. A city underground would be a lot less vulnerable to nuclear war than a moon colony, and an Antarctic or ocean-floor city probably no more so. These settings also often involve things like moving asteroids, which would be new existential risks in their own right.
I guess it’s true that the drama and mythos of space exploration might drive initial investment in a way that an Antarctic or underground city couldn’t- but if that’s the only advantage it has, I feel like it runs into pretty extreme diminishing returns- a Mars colony of six million is only a bit more dramatic than a colony of 100.
Maybe you could have the space colonization be driven by the discovery of some incredibly valuable resource that doesn’t exist on Earth, like ancient remnants of alien nanotech?
Or maybe the alternate timeline actually could involve people building lots of cities in less habitable areas on Earth before going into space. Maybe there’s a limited nuclear exchange in the 50s that kills 20% of the global population and leads to a widespread belief that living aboveground is unsafe. Then there’s another limited exchange in the 80s, which doesn’t do as much damage because so much of the population and infrastructure has already been moved underground, but which leads to a strong demand for even safer cities deep under the crust of Mars and the moon- which is a smaller step for people in this timeline, since lots of people are already surviving on hydroponics and in cramped, enclosed living spaces.
artifex0
Actually, I think Yudkowsky coined it a few months prior to that article- see http://extropians.weidai.com/extropians/0303/4140.html. That’s dated March 11, 2003, wheres the Bostrom article was apparently published at a conference later in July of that year- see https://openlibrary.org/books/OL53814584M/Cognitive_emotive_and_ethical_aspects_of_decision_making_in_humans_and_in_artificial_intelligence_vo
It looks like Bostrom was on that Extropians mailing list when the EY post was written, and ChatGPT in deep research mode was unable to find earlier mentions of the thought experiment- so, it seems plausible that Bostrom was actually referencing EY in that article.
See Mick West (https://www.youtube.com/@MickWest) for detailed debunkings of videos claiming to be evidence of alien UFOs.
In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out the code to repos.
Most of the edits I make have to do with the models’ reluctance to delete code- so, for example, if a block of code in function A needs to be moved into its own function so that functions B and C can call it, the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A. It also sometimes comes up with strange excuses to avoid deleting code that’s become superfluous.
The models also occasionally have an issue where they’ll add fallbacks to prevent functions from returning an error even when they really should return an error, such as when a critical API call returns bad data.
So, in a way, the main bottleneck to the AI doing everything one-shot at this point seems to be alignment rather than capability- the models were trained to avoid errors and avoid deleting code, and they care more about those than producing good codebases. Though, that said, these issues almost never actually produce bugs, and dealing with them is arguably more stylistic than functional.
In my department, I think all of the other developers are using AI in the same way- judging by how the style of the code they’ve been deploying has changed recently- but nobody talks about it. It’s treated almost like an embarrassing open secret, like people watching YouTube videos while on the clock, and I think everyone’s afraid that if the project managers ever get a clear picture of how much the developers are acting like PMs for AI, the business will start cutting jobs.
I agree, though if we’re defining rationality as a preference for better methods, I think we ought to further disambiguate between “a decision theory that will dissolve apparent conflicts between what we currently want our future selves to do and what those future selves actually want to do” and “practical strategies for aligning our future incentives with our current ones”
Suppose someone tells you that they’ll offer you $100 tomorrow and $10,000 today if you make a good-faith effort to prevent yourself from accepting the $100 tomorrow. The best outcome would be to make a genuine attempt to disincentivize yourself from accepting the money tomorrow, but fail and accept the money anyway- however, you can’t actually try and make that happen without violating the terms of the deal.
if your effort to constrain your future self on day one does fail, I don’t think there’s a reasonable decision theory that would argue you should reject the money anyway. On day one, you’re being paid to temporarily adopt preferences misaligned with your preferences on day two. You can try to make that change in preferences permanent, or to build an incentive structure to enforce that preference, or maybe even strike an acausal bargain with your day two self, but if all of that fails, you ought to go ahead and accept the $100.
I think coordination problems are a lot like that. They reward you for adopting preferences genuinely at odds with those you may have later on. And what’s rational according to one set of preferences will be irrational according to another.
I wonder if some of the conflation between belief-as-prediction and belief-as-investment is actually a functional social technology for solving coordination problems. To avoid multi-polar traps, people need to trust eachother to act against individual incentives- to rationally pre-commit to acting irrationally in the future. Just telling people “I’m planning act against my incentives, even though I know that doing so will be irrational at the time” might not be very convincing, but instead claiming to have irrationally certain beliefs that would change your incentives were that certainty warranted can be more convincing. Even if people strongly suspect that you’re exaggerating, they know that the social pressure to avoid a loss of status by admitting that you were wrong will make you less likely to defect.
For example, say you’re planning to start a band with some friends. You all think the effort and investment will be worth it so long as there’s a 50% chance of the band succeeding, and you all privately think there’s about about a 70% chance of the band succeeding if everyone stays committed, and a near 0% chance if anybody drops out. Say there’s enough random epistemic noise that you think it’s pretty likely someone in the band will eventually drop their odds below that 50% threshold, even when you personally still give success conditional on commitment much better odds. So, unless you can trust everyone to stay committed even if they come to believe it’s not worth the effort, you might as well give up on the band before starting it. Classic multi-polar trap. If, however, everyone at the start is willing to say “I’m certain we’ll succeed”, putting more of their reputation on the line, that might build enough trust to overcome the coordination problem.
Of course, this can create all sorts of epistemic problems. Maybe everyone in the band comes to believe that it’s not worth the effort, but incorrectly think that saying so will be a defection. Maybe their exaggerated certainty misleads other people in ways that cause them to make bad investments or to dangerously misunderstand the music industry.
Maybe there’s a sense in which this solution to individual coordination problems is part of a larger coordination problem- everyone incentivized to reap the value of greater trust, but causing a greater loss of value to people more broadly by damaging the epistemic commons.
There might be some motivated reasoning on that last point, however, since I definitely find it emotionally uncomfortable when people say inaccurate things for social reasons.
I’m certain your model of what purpose is is a lot more detailed than mine. My take, however, is that animal brains don’t exactly have a utility function, but probably do have something functionally similar to a reward function in machine learning. A well-defined set of instrumental goals terminating in terminal goals would be a very effective way of maximizing that reward, so the behaviors reinforced will often converge on an approximation of that structure. However, the biological learning algorithm is very bad at consistently finding the structure, and so the approximations will tend to shift around and conflict a lot- behaviors that approximate a terminal goal one year might approximate an instrumental goal later on, or cease to approximate any goal at all. Imagine a primitive image diffusion model with a training set of face photos- you run it on a set of random pixels, and it starts producing eyes and mouths and so on in random places, then gradually shifts those around into a slightly more coherent image as the remaining noise decreases.
So, instrumental and terminal goals in my model aren’t so much things agents actually have as a sort of logical structure that influences how our behaviors develop. It’s sort of like the structure of “if A implies B which implies C, then A implies C”- that’s something that exists prior to us, but we tend to adopt behaviors approximating it because doing so produces a lot of reward. Note, though, that comparing the structure of goals to logic can be confusing, since logic can help promote terminal goals- so when we’re approximating having goals, we want to be logical, but we have no reason to want to have terminal goals. That just something our biological reward function tends to reinforce.
Regarding my use of the term “category error”, I used that term rather than saying “terminal goals don’t require justification” because, while technically accurate, the use of the word “require” there sounds very strange to me. To “require” something means that it’s necessary to promote some terminal goal. So, the phrase reads to me a bit like “a king is the rank which doesn’t follow the king’s orders”- accurate, technically, but odd. More sensible to say instead that following the king’s orders is something having to do with subjects, and a category error when applied to a king.
Definitions and justifications have to be circular at some point, or else terminate in some unexplained things, or else create an infinite chain.
If I’m understanding your point correctly, I think I disagree completely. A chain of instrumental goals terminates in a terminal goal, which is a very different kind of thing from an instrumental goal in that assigning properties like “unjustified” or “useless” to it is a category error. Instrumental goals either promote higher goals or are unjustified, but that’s not true of all goals- it’s just something particular to that one type of goal.
I’d also argue that a chain of definitions terminates in qualia- things like sense data and instincts determine the structure of our most basic concepts, which define higher concepts, but calling qualia “undefined” would be a category error.
There is no fundamental physical structure which constitutes agency
I also don’t think I agree with this. A given slice of objective reality will only have so much structure- only so many ways of compressing it down with symbols and concepts. It’s true that we’re only interested in a narrow subset of that structure that’s useful to us, but the structure nevertheless exists prior to us. When we come up with a useful concept that objectively predicts part of reality, we’ve, in a very biased way, discovered an objective part of the structure of reality- and I think that’s true of the concept of agency.
Granted, maybe there’s a strange loop in the way that cognitive reduction can be further reduced to physical reduction, while physical reduction can be further reduced to cognitive reduction- objective structure defines qualia, which defines objective structure. If that’s what you’re getting at, you may be on to something.
There seems to be a strong coalition around consciousness
One further objection, however: given that we don’t really understand consciousness, I think the cultural push to base our morality around it is a really bad idea.
If it were up to me, we’d split morality up into stuff meant to solve coordination problems by getting people to pre-commit to not defecting, stuff meant to promote compassionate ends for their own sake, and stuff that’s just traditional. Doing that instead of conflating everything into a single universal imperative would get rid of the deontology/consequentialism confusion, since deontology would explain the first thing and consequentialism the second, and by not founding our morality on poorly understood philosophy concepts, we wouldn’t risk damaging useful social technologies or justifying horrifying atrocities if Dennettian illusionism turns out to be true or something.
An important bit of context that often gets missed when discussing this question is that actual trans athletes competing in women’s sports are very rare. Of the millions competing in organized sports in the US, the total number who are trans might be under 20 (see this statement from the NCAA president estimating “fewer than ten” in college sports, this article reporting that an anti-trans activist group was able to identify only five in K-12 sports, and this Wikipedia article, which identifies only a handful of trans athletes in professional US sports).
Because this phenomenon is so rare relative to how often it’s discussed, I’m a lot more interested in the sociology of the question than the question itself. There was a recent post from Hanson arguing that the Left and Right in the US have become like children on a road trip annoying each other in deniable ways to provoke responses that they hope their parents will punish. I think the discrepancy between the scale of the issue and how often it comes up is mostly due to it being used in this way.
A high school coach who has to choose whether to allow a trans student to compete in female sports is faced with a difficult social dilemma. If they deny the request, then the student- who wants badly to be seen as female- will be disappointed and might face additional bullying; if they allow it, that will be unfair to the other female players. In some cases, other players may be willing to accept a bit of unfairness as an act of probably supererogatory kindness, but in cases where they are aren’t, explaining to the student that they shouldn’t compete without hurting their feelings will take a lot of tact on the part of the coach.
Elevating this to a national conversation isn’t very tactful. People on the right can plausibly claim to only be concerned with fairness in sports, but presented so publicly, this looks to liberals like an attempt to bully trans people. They’re annoyed, and may be provoked into responding in hard to defend ways like demanding unconditional trans participation in women’s sports- which I think is often the point. It’s a child in a car poking the air next to his sister and saying “I’m not touching you”, hoping that she’ll slap him and be punished.
I’m certain the OP didn’t intend anything like that- LessWrong is, of course, a very high-decoupling place. But I’d argue that this is an issue best resolved by letting the very few people directly affected sort out the messy emotions involved among themselves, rather than through public analysis of the question on the object level.
So, in practice, what might that look like?
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at “only somewhat autonomous”? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?
Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?
The most important red line would have to be strong superintelligence, don’t you think? I mean, if we have systems that are agentic in the way humans are, but surpass us in capabilities in the way we surpass animals, it seems like specific bans on the use of weapons, self-replication, and so on might not be very effective at keeping them in check.
Was it necessary to avoid mentioning ASI in the “concrete examples” section of the website to get these signatories on board? Are you concerned that avoiding that subject might contribute to the sense that discussion of ASI is non-serious or outside of the Overton window?
I think this is related to what Chalmers calls the “meta problem of consciousness”- the problem of why it seems subjectively undeniable that a hard problem of consciousness exists, even though it only seems possible to objectively describe “easy problems” like the question of whether a system has an internal representation of itself. Illusionism- the idea that the hard problem is illusory- is an answer to that problem, but I don’t think it fully explains things.
Consider the question “why am I me, rather than someone else”. Objectively, the question is meaningless- it’s a tautology like “why is Paris Paris”. Subjectively, however, it makes sense, because your identity in objective reality and your consciousness are different things- you can imagine “yourself” seeing the world through different eyes, with different memories and so on, even though that “yourself” doesn’t map to anything in objective reality. The statement “I am me” also seems to add predictive power to a subjective model of reality- you can reason inductively that since “you” were you in the past, you will continue to be in the future. But if someone else tells you “I am me”, that doesn’t improve your model’s predictive power at all.
I think there’s a real epistemological paradox there, possibly related somehow to the whole liar’s/Godel’s/Russell’s paradox thing. I don’t think it’s as simple as consciousness being equivalent to a system with a representation of itself.
I used to do graphic design professionally, and I definitely agree the cover needs some work.
I put together a few quick concepts, just to explore some possible alternate directions they could take it:
https://i.imgur.com/zhnVELh.png
https://i.imgur.com/OqouN9V.png
https://i.imgur.com/Shyezh1.png
These aren’t really finished quality either, but the authors should feel free to borrow and expand on any ideas they like if they decide to do a redesign.
This suggests that in order to ensure a sincere author-concept remains in control, the training data should carefully exclude any text written directly by a malicious agent (e.g. propaganda).
I don’t think that would help much, unfortunately. Any accurate model of the world will also model malicious agents, even if the modeller only ever learns about them second-hand. So the concepts would still be there for the agent to use if it was motivated to do so.
Censoring anything written by malicious people would probably make it harder to learn about some specific techniques of manipulation that aren’t discussed much by non-malicious people or which appear much in fiction- but I doubt that would be much more than a brief speed bump for a real misaligned ASI, and probably at the expense of reducing useful capabilities in earlier models like the ability to identify maliciousness, which would give an advantage to competitors.
A counterpoint: when I skip showers, my cat appears strongly in favor of smell of my armpits- occasionally going so far as to burrow into my shirt sleeves and bite my armpit hair (which, to both my and my cat’s distress, is extremely ticklish). Since studies suggest that cats have a much more sensitive olfactory sense than humans (see https://www.mdpi.com/2076-2615/14/24/3590), it stands to reason that their judgement regarding whether smelling nice is good or bad should hold more weight than our own. And while my own cat’s preference for me smelling bad is only anecdotal evidence, it does seem to suggest at least that more studies are required to fully resolve the question.
I think it’s a very bad idea to dismiss the entirety of news as a “propaganda machine”. Certainly some sources are almost entirely propaganda. More reputable sources like the AP and Reuters will combine some predictable bias with largely trustworthy independent journalism. Identifying those more reliable sources and compensating for their bias takes effort and media literacy, but I think that effort is quite valuable- both individually and collectively for society.
Accurate information about large, important events informs our world model and improves our predictions. Sure, a war in the Middle East might not noticeably affect your life directly, but it’s rare that a person lives an entire life completely unaffected by any war, and having a solid understanding of how wars start and progress based on many detailed examples will help us prepare and react sensibly when that happens. Accurate models of important things will also end up informing our understanding of tons of things that might have originally seemed unrelated. That’s all true, of course, of more neglected sources of information- but it seems like the best strategy for maximizing the usefulness of your models is to focus on information which seems important or surprising, regardless of neglectedness.
Independent journalism also checks the power of leaders. Even in very authoritarian states, the public can collectively exert some pressure against corruption and incompetence by threatening instability- but only if they’re able to broadly coordinate on a common understanding of those things. The reason so many authoritarians deny the existence of reliable independent journalism- often putting little to no effort into hiding the propagandistic nature of their state media- is that by promoting that maximally cynical view of journalism, they immunize their populations against information not under their control. Neglected information can allow for a lot of personal impact, but it’s not something societies can coordinate around- so focusing on it to the exclusion of everything else may represent a kind of defection in the coordination problem of civic duty.
Of course, we have to be very careful with our news consumption- even the most sober, reliable sources will drive engagement by cherry-picking stories, which can skew our understanding of the frequency of all kinds of problems. But availability bias is a problem we have to learn to compensate for in all sorts of different domains- it would be amazing if we were able to build a rich model of important global events by consuming only purely unbiased information, but that isn’t the world we live in. The news is the best we’ve got, and we ought to use it.
So, the current death rate for an American in their 30s is about 0.2%. That probably increases another 0.5% or so when you consider black swan events like nuclear war and bioterrorism. Let’s call “unsafe” a ~3x increase in that expected death rate to 2%.
An increase that large would take something a lot more dramatic than the kind of politics we’re used to in the US, but while political changes that dramatic are rare historically, I think we’re at a moment where the risk is elevated enough that we ought to think about the odds.
I might, for example, give odds for a collapse of democracy in the US over the next couple of years at ~2-5%- if the US were to elect 20 presidents similar to the current one over a century, I’d expect better than even odds of one of them making themselves into a Putinesque dictator. A collapse like that would substantially increase the risk of war, I’d argue, including raising a real possibility of nuclear civil war. That might increase the expected death rate for young and middle-aged adults in that scenario by a point or two on its own. It might also introduce a small risk of extremely large atrocities against minorities or political opponents, which could increase the expected death rate by a few tenths of a percent.
There’s also a small risk of economic collapse. Something like a political takeover of the Fed combined with expensive, poorly considered populist policies might trigger hyperinflation of the dollar. When that sort of thing happens overseas, you’ll often see reduced health outcomes and breakdown in civil order increasing the death rate by up to a percent- and, of course, it would introduce new tail risks, increasing the expected death rate further.
I should note that I don’t think the odds of any of this are high enough to worry about my safety now- but needing to emigrate is much more likely outcome than actually being threatened, and that’s a headache I am mildly worried about.
That’s a crazy low probability.
Honestly, my odds of this have been swinging anywhere from 2% to 15% recently. Note that this would be the odds of our democratic institutions deteriorating enough that fleeing the country would seem like the only reasonable option- p(fascism) more in the sense of a government that most future historians would assign that or a similar label to, rather than just a disturbingly cruel and authoritarian administration still held somewhat in check by democracy.
I wonder: what odds would people here put on the US becoming a somewhat unsafe place to live even for citizens in the next couple of years due to politics? That is, what combined odds should we put on things like significant erosion of rights and legal protections for outspoken liberal or LGBT people, violent instability escalating to an unprecedented degree, the government launching the kind of war that endangers the homeland, etc.?
My gut says it’s now at least 5%, which seems easily high enough to start putting together an emigration plan. Is that alarmist?
More generally, what would be an appropriate smoke alarm for this sort of thing?
That’s true, but we also don’t see a rush to build cities similar extreme environments on earth where treaties aren’t a barrier, such as the interior of Greenland or the coastal shelves. I’d argue that the treaties remaining in place are probably a result of very low demand to colonize these areas rather than the opposite.