Just want to say: I thought this year’s April 1st event was really great.
I think it succeeded for some of the same reasons that Reddit’s Place events did- inviting community participation in a creative project that feels very low-stakes is both fun for participants and produces a lot of unexpected creative work. It also probably let some people dip their toes into vibe coding who otherwise wouldn’t have- which is good for giving people a first-hand impression of where LLM capabilities are at currently.
Thank you to everyone who set that up and paid for all of those Claude tokens.
artifex0
A detailed starship bridge scene in which LW site elements take the place of ship controls, with a relaxing warp speed animation.
LessWrong is now a fully 3D walking simulator! Experience your favorite homepage as a slightly eerie neoclassical liminal space!
Could I convince you to extend the conversation reply limit by one or two messages? I asked Claude to develop a 3D museum environment for the site, which the user could move around with WASD keys like a first-person video game- and after a lot of back-and-forth, the results were pretty amazing. However, the final request before the reply limit introduced a bug that prevents the page from loading, so the entire thing is currently lost.
I wonder why the LLMs were able to catch the conspiracy subtext but not the AI subtext. Reading the story, the latter seemed more obvious to me than the former, and I don’t think that was entirely due to having found the story on LessWrong.
Is it as simple as the models’ training sets including lots of examples of fiction with indirectly described criminal plots, but almost none dealing with the speculative future of LLM writing? Or could it be some kind of unintended effect of the models’ alignment training? Like, maybe we’ve trained the models not just to act according to an AI lab’s idea of “harmless”, but also to associate that behavior with the concept of “harmless” such that they struggle to recognize potential harm that behavior might cause?
That’s true, but we also don’t see a rush to build cities similar extreme environments on earth where treaties aren’t a barrier, such as the interior of Greenland or the coastal shelves. I’d argue that the treaties remaining in place are probably a result of very low demand to colonize these areas rather than the opposite.
I feel like this timeline runs into something that bothers me about a lot of space colonization scifi: why are there millions of people living in space colonies when in our timeline, there aren’t millions of people living in Antarctic colonies? Antarctica is a lot more habitable and resource rich than the moon or Mars, building cities there would require a pretty small fraction of the cost of doing so in space, and the standard of living for residents would be a lot higher. The same seems to be true of cities on the ocean floor and underground. Shouldn’t we expect to see a lot of colonization like that first, before moving to space becomes economically sensible?
Granted, there is the old idea of space colonies as a sort of “backup civilization” in case something happens to civilization on earth, which your setting touches on- but I think that probably has the same issue. A city underground would be a lot less vulnerable to nuclear war than a moon colony, and an Antarctic or ocean-floor city probably no more so. These settings also often involve things like moving asteroids, which would be new existential risks in their own right.
I guess it’s true that the drama and mythos of space exploration might drive initial investment in a way that an Antarctic or underground city couldn’t- but if that’s the only advantage it has, I feel like it runs into pretty extreme diminishing returns- a Mars colony of six million is only a bit more dramatic than a colony of 100.
Maybe you could have the space colonization be driven by the discovery of some incredibly valuable resource that doesn’t exist on Earth, like ancient remnants of alien nanotech?
Or maybe the alternate timeline actually could involve people building lots of cities in less habitable areas on Earth before going into space. Maybe there’s a limited nuclear exchange in the 50s that kills 20% of the global population and leads to a widespread belief that living aboveground is unsafe. Then there’s another limited exchange in the 80s, which doesn’t do as much damage because so much of the population and infrastructure has already been moved underground, but which leads to a strong demand for even safer cities deep under the crust of Mars and the moon- which is a smaller step for people in this timeline, since lots of people are already surviving on hydroponics and in cramped, enclosed living spaces.
Actually, I think Yudkowsky coined it a few months prior to that article- see http://extropians.weidai.com/extropians/0303/4140.html. That’s dated March 11, 2003, wheres the Bostrom article was apparently published at a conference later in July of that year- see https://openlibrary.org/books/OL53814584M/Cognitive_emotive_and_ethical_aspects_of_decision_making_in_humans_and_in_artificial_intelligence_vo
It looks like Bostrom was on that Extropians mailing list when the EY post was written, and ChatGPT in deep research mode was unable to find earlier mentions of the thought experiment- so, it seems plausible that Bostrom was actually referencing EY in that article.
See Mick West (https://www.youtube.com/@MickWest) for detailed debunkings of videos claiming to be evidence of alien UFOs.
In the spirit of posting more on-the-ground impressions of capability: in my fairly simple front-end coding job, I’ve gone in the past year from writing maybe 50% of my code with AI to maybe 90%.
My job the past couple of months has been this: attending meetings to work out project requirements, breaking those requirements into a more specific sequence of tasks for the AI- often just three or four prompts with a couple of paragraphs of explanation each- then running through those in Cursor, reviewing the changes and making usually pretty minor edits, then testing- which almost never reveals errors introduced by the AI itself in recent weeks- and finally pushing out the code to repos.
Most of the edits I make have to do with the models’ reluctance to delete code- so, for example, if a block of code in function A needs to be moved into its own function so that functions B and C can call it, the AI will often just repeat the code block in B and C so that it doesn’t have to delete anything in A. It also sometimes comes up with strange excuses to avoid deleting code that’s become superfluous.
The models also occasionally have an issue where they’ll add fallbacks to prevent functions from returning an error even when they really should return an error, such as when a critical API call returns bad data.
So, in a way, the main bottleneck to the AI doing everything one-shot at this point seems to be alignment rather than capability- the models were trained to avoid errors and avoid deleting code, and they care more about those than producing good codebases. Though, that said, these issues almost never actually produce bugs, and dealing with them is arguably more stylistic than functional.
In my department, I think all of the other developers are using AI in the same way- judging by how the style of the code they’ve been deploying has changed recently- but nobody talks about it. It’s treated almost like an embarrassing open secret, like people watching YouTube videos while on the clock, and I think everyone’s afraid that if the project managers ever get a clear picture of how much the developers are acting like PMs for AI, the business will start cutting jobs.
I agree, though if we’re defining rationality as a preference for better methods, I think we ought to further disambiguate between “a decision theory that will dissolve apparent conflicts between what we currently want our future selves to do and what those future selves actually want to do” and “practical strategies for aligning our future incentives with our current ones”
Suppose someone tells you that they’ll offer you $100 tomorrow and $10,000 today if you make a good-faith effort to prevent yourself from accepting the $100 tomorrow. The best outcome would be to make a genuine attempt to disincentivize yourself from accepting the money tomorrow, but fail and accept the money anyway- however, you can’t actually try and make that happen without violating the terms of the deal.
if your effort to constrain your future self on day one does fail, I don’t think there’s a reasonable decision theory that would argue you should reject the money anyway. On day one, you’re being paid to temporarily adopt preferences misaligned with your preferences on day two. You can try to make that change in preferences permanent, or to build an incentive structure to enforce that preference, or maybe even strike an acausal bargain with your day two self, but if all of that fails, you ought to go ahead and accept the $100.
I think coordination problems are a lot like that. They reward you for adopting preferences genuinely at odds with those you may have later on. And what’s rational according to one set of preferences will be irrational according to another.
I wonder if some of the conflation between belief-as-prediction and belief-as-investment is actually a functional social technology for solving coordination problems. To avoid multi-polar traps, people need to trust eachother to act against individual incentives- to rationally pre-commit to acting irrationally in the future. Just telling people “I’m planning act against my incentives, even though I know that doing so will be irrational at the time” might not be very convincing, but instead claiming to have irrationally certain beliefs that would change your incentives were that certainty warranted can be more convincing. Even if people strongly suspect that you’re exaggerating, they know that the social pressure to avoid a loss of status by admitting that you were wrong will make you less likely to defect.
For example, say you’re planning to start a band with some friends. You all think the effort and investment will be worth it so long as there’s a 50% chance of the band succeeding, and you all privately think there’s about about a 70% chance of the band succeeding if everyone stays committed, and a near 0% chance if anybody drops out. Say there’s enough random epistemic noise that you think it’s pretty likely someone in the band will eventually drop their odds below that 50% threshold, even when you personally still give success conditional on commitment much better odds. So, unless you can trust everyone to stay committed even if they come to believe it’s not worth the effort, you might as well give up on the band before starting it. Classic multi-polar trap. If, however, everyone at the start is willing to say “I’m certain we’ll succeed”, putting more of their reputation on the line, that might build enough trust to overcome the coordination problem.
Of course, this can create all sorts of epistemic problems. Maybe everyone in the band comes to believe that it’s not worth the effort, but incorrectly think that saying so will be a defection. Maybe their exaggerated certainty misleads other people in ways that cause them to make bad investments or to dangerously misunderstand the music industry.
Maybe there’s a sense in which this solution to individual coordination problems is part of a larger coordination problem- everyone incentivized to reap the value of greater trust, but causing a greater loss of value to people more broadly by damaging the epistemic commons.
There might be some motivated reasoning on that last point, however, since I definitely find it emotionally uncomfortable when people say inaccurate things for social reasons.
I’m certain your model of what purpose is is a lot more detailed than mine. My take, however, is that animal brains don’t exactly have a utility function, but probably do have something functionally similar to a reward function in machine learning. A well-defined set of instrumental goals terminating in terminal goals would be a very effective way of maximizing that reward, so the behaviors reinforced will often converge on an approximation of that structure. However, the biological learning algorithm is very bad at consistently finding the structure, and so the approximations will tend to shift around and conflict a lot- behaviors that approximate a terminal goal one year might approximate an instrumental goal later on, or cease to approximate any goal at all. Imagine a primitive image diffusion model with a training set of face photos- you run it on a set of random pixels, and it starts producing eyes and mouths and so on in random places, then gradually shifts those around into a slightly more coherent image as the remaining noise decreases.
So, instrumental and terminal goals in my model aren’t so much things agents actually have as a sort of logical structure that influences how our behaviors develop. It’s sort of like the structure of “if A implies B which implies C, then A implies C”- that’s something that exists prior to us, but we tend to adopt behaviors approximating it because doing so produces a lot of reward. Note, though, that comparing the structure of goals to logic can be confusing, since logic can help promote terminal goals- so when we’re approximating having goals, we want to be logical, but we have no reason to want to have terminal goals. That just something our biological reward function tends to reinforce.
Regarding my use of the term “category error”, I used that term rather than saying “terminal goals don’t require justification” because, while technically accurate, the use of the word “require” there sounds very strange to me. To “require” something means that it’s necessary to promote some terminal goal. So, the phrase reads to me a bit like “a king is the rank which doesn’t follow the king’s orders”- accurate, technically, but odd. More sensible to say instead that following the king’s orders is something having to do with subjects, and a category error when applied to a king.
Definitions and justifications have to be circular at some point, or else terminate in some unexplained things, or else create an infinite chain.
If I’m understanding your point correctly, I think I disagree completely. A chain of instrumental goals terminates in a terminal goal, which is a very different kind of thing from an instrumental goal in that assigning properties like “unjustified” or “useless” to it is a category error. Instrumental goals either promote higher goals or are unjustified, but that’s not true of all goals- it’s just something particular to that one type of goal.
I’d also argue that a chain of definitions terminates in qualia- things like sense data and instincts determine the structure of our most basic concepts, which define higher concepts, but calling qualia “undefined” would be a category error.
There is no fundamental physical structure which constitutes agency
I also don’t think I agree with this. A given slice of objective reality will only have so much structure- only so many ways of compressing it down with symbols and concepts. It’s true that we’re only interested in a narrow subset of that structure that’s useful to us, but the structure nevertheless exists prior to us. When we come up with a useful concept that objectively predicts part of reality, we’ve, in a very biased way, discovered an objective part of the structure of reality- and I think that’s true of the concept of agency.
Granted, maybe there’s a strange loop in the way that cognitive reduction can be further reduced to physical reduction, while physical reduction can be further reduced to cognitive reduction- objective structure defines qualia, which defines objective structure. If that’s what you’re getting at, you may be on to something.
There seems to be a strong coalition around consciousness
One further objection, however: given that we don’t really understand consciousness, I think the cultural push to base our morality around it is a really bad idea.
If it were up to me, we’d split morality up into stuff meant to solve coordination problems by getting people to pre-commit to not defecting, stuff meant to promote compassionate ends for their own sake, and stuff that’s just traditional. Doing that instead of conflating everything into a single universal imperative would get rid of the deontology/consequentialism confusion, since deontology would explain the first thing and consequentialism the second, and by not founding our morality on poorly understood philosophy concepts, we wouldn’t risk damaging useful social technologies or justifying horrifying atrocities if Dennettian illusionism turns out to be true or something.
An important bit of context that often gets missed when discussing this question is that actual trans athletes competing in women’s sports are very rare. Of the millions competing in organized sports in the US, the total number who are trans might be under 20 (see this statement from the NCAA president estimating “fewer than ten” in college sports, this article reporting that an anti-trans activist group was able to identify only five in K-12 sports, and this Wikipedia article, which identifies only a handful of trans athletes in professional US sports).
Because this phenomenon is so rare relative to how often it’s discussed, I’m a lot more interested in the sociology of the question than the question itself. There was a recent post from Hanson arguing that the Left and Right in the US have become like children on a road trip annoying each other in deniable ways to provoke responses that they hope their parents will punish. I think the discrepancy between the scale of the issue and how often it comes up is mostly due to it being used in this way.
A high school coach who has to choose whether to allow a trans student to compete in female sports is faced with a difficult social dilemma. If they deny the request, then the student- who wants badly to be seen as female- will be disappointed and might face additional bullying; if they allow it, that will be unfair to the other female players. In some cases, other players may be willing to accept a bit of unfairness as an act of probably supererogatory kindness, but in cases where they are aren’t, explaining to the student that they shouldn’t compete without hurting their feelings will take a lot of tact on the part of the coach.
Elevating this to a national conversation isn’t very tactful. People on the right can plausibly claim to only be concerned with fairness in sports, but presented so publicly, this looks to liberals like an attempt to bully trans people. They’re annoyed, and may be provoked into responding in hard to defend ways like demanding unconditional trans participation in women’s sports- which I think is often the point. It’s a child in a car poking the air next to his sister and saying “I’m not touching you”, hoping that she’ll slap him and be punished.
I’m certain the OP didn’t intend anything like that- LessWrong is, of course, a very high-decoupling place. But I’d argue that this is an issue best resolved by letting the very few people directly affected sort out the messy emotions involved among themselves, rather than through public analysis of the question on the object level.
So, in practice, what might that look like?
Of course, AI labs use quite a bit of AI in their capabilities research already- writing code, helping with hardware design, doing evaluations and RLAIF; even distillation and training itself could sort of be thought of as a kind of self-improvement. So, would the red line need to target just fully autonomous self-improvement? But just having a human in the loop to rubber-stamp AI decisions might not actually slow down an intelligence explosion by all that much, especially at very aggressive labs. So, would we need some kind of measure for how autonomous the capabilities research at a lab is, and then draw the line at “only somewhat autonomous”? And if we were able to define a robust threshold, could we really be confident that it would prevent ASI development altogether, rather than just slowing it down?
Suppose instead we had a benchmark that measured something like the capabilities of AI agents in long-term real-world tasks like running small businesses and managing software development projects. Do you think it might make sense to draw a red line on somewhere on that graph- targeting a dangerous level of capabilities directly, rather than trying to prevent that level of capabilities from being developed by targeting research methods?
The most important red line would have to be strong superintelligence, don’t you think? I mean, if we have systems that are agentic in the way humans are, but surpass us in capabilities in the way we surpass animals, it seems like specific bans on the use of weapons, self-replication, and so on might not be very effective at keeping them in check.
Was it necessary to avoid mentioning ASI in the “concrete examples” section of the website to get these signatories on board? Are you concerned that avoiding that subject might contribute to the sense that discussion of ASI is non-serious or outside of the Overton window?
I think this is related to what Chalmers calls the “meta problem of consciousness”- the problem of why it seems subjectively undeniable that a hard problem of consciousness exists, even though it only seems possible to objectively describe “easy problems” like the question of whether a system has an internal representation of itself. Illusionism- the idea that the hard problem is illusory- is an answer to that problem, but I don’t think it fully explains things.
Consider the question “why am I me, rather than someone else”. Objectively, the question is meaningless- it’s a tautology like “why is Paris Paris”. Subjectively, however, it makes sense, because your identity in objective reality and your consciousness are different things- you can imagine “yourself” seeing the world through different eyes, with different memories and so on, even though that “yourself” doesn’t map to anything in objective reality. The statement “I am me” also seems to add predictive power to a subjective model of reality- you can reason inductively that since “you” were you in the past, you will continue to be in the future. But if someone else tells you “I am me”, that doesn’t improve your model’s predictive power at all.
I think there’s a real epistemological paradox there, possibly related somehow to the whole liar’s/Godel’s/Russell’s paradox thing. I don’t think it’s as simple as consciousness being equivalent to a system with a representation of itself.
I used to do graphic design professionally, and I definitely agree the cover needs some work.
I put together a few quick concepts, just to explore some possible alternate directions they could take it:
https://i.imgur.com/zhnVELh.png
https://i.imgur.com/OqouN9V.png
https://i.imgur.com/Shyezh1.png
These aren’t really finished quality either, but the authors should feel free to borrow and expand on any ideas they like if they decide to do a redesign.
I posed the question to the Claude and Perplexity in deep research mode, and they produced these articles:
https://claude.ai/public/artifacts/e536be5d-999a-4158-842e-f2fbc9b7a28d
https://www.perplexity.ai/search/the-post-at-https-www-lesswron-S4Tsth3ZTJap_4znEWMknA
They both reference Francis Bacon’s 1627 utopian book New Atlantis, in which he apparently describes a fictional state-sponsored research institute producing mechanical vehicles, flying machines and submarines. Several other 17th century writers, including John Wilkins and Robert Boyle, speculated along similar lines, and when Papin built an early steam engine prototype in 1690, he also envisioned it being used in transportation.
Looks like none of them really grasped that automation might produce something as profoundly strange as our modern civilization, however. Even late 18th century thinkers like Erasmus Darwin who recognized the early steam engines as big deal seem to have mostly only expected a revolution in transportation, rather than in everything.