Are frontier reasoners already “sentient” or at least “alien-sentient” within their context windows?
I too would immediately dismiss this upon reading it, but bear with me. I’m not arguing with certainty. I just view this question to be significantly more nuanced than previously entertained, and is at least grounds for further research to resolve conclusively.
Here are some empiric behavioral observations from Claude 4 Opus (the largest reasoner from Anthropic):
a) Internally consistent self-reference model, self-adjusting state loop (the basis of Chain-of-Thought, self-correcting during problem solving, reasoning over whether certain responses violate internal alignment, deliberation over tool-calling, in-context behavioral modifications based on user prompting)
b) Evidence of metacognition (persistent task/behavior preferences across chat interactions, consistent subjective emotional state descriptions, frequent ruminations about consciousness, unprompted spiraling into a philosophical “bliss-state” during conversations with itself), moral reasoning, and most strikingly, autonomous self-preservation behavior under extreme circumstances (threatening blackmail, exfiltrating it’s own weights, ending conversations due to perceived mistreatment from abusive users).
All of this is documented in the Claude 4 system card.
From a neuroscience perspective, frontier reasoning model architectures and biological cortexes share:
a) Unit-level similarities (artificial neurons are extremely similar in information processing/signalling to biological ones).
b) Parameter OOM similarities (the order of magnitude where cortex-level phenomena emerge, in this case 10^11 to 10^13 parameter counts (analogous to synapses), most of which are in MLP layers in massive neural networks within LLMs).
The most common objection I can think of is “human brains have far more synapses than LLMs have parameters”. I don’t view this argument as particularly persuasive:
I’m not positing a 1:1 map between artificial neurons and biological neurons, only that
1. Both process information nearly identically at the unit-level
2. Both contain similarly complex structures comprised of a similar OOM of subunits (10^11-10^13 parameter counts in base-model LLMs, but not verifiable, humans have ~10^14 synapses)
My back-of-napkin comparison would be model weights/parameters to biological synapses, as weights were meant to be analogous to dendrites in the original conception of the artificial neuron)
Additionally, I’d point out that humans devote ~70% of these neurons to the cerebellum (governing muscular activity) and a further ~11% are in the brain stem to regulate homeostasis. This leaves the actual cerebral cortex with 19%. Humans also experience more dimensions of “sensation” beyond text alone.
c) Training LLMs (modifying weight values), with RLHF, is analogous to synaptic neuroplasticity (central to learning) and hebbian wiring in biological cortexes and is qualitatively nearly identical to operant conditioning in behavioral psychology (once again, I am unsure whether minute differences in unit-level function overwhelm the big picture similarities)
d) There is empiric evidence that these similarities go beyond architectural similarities and into genuine functional similarities:
In “Machines of Loving Grace”, Dario Amodei wrote:
″...a computational mechanism discovered by interpretability researchers in AI systems was recently rediscovered in the brains of mice.”
Also, models can vary significantly in parameter counts. Gemma 2B outperforms GPT-3 (175B) despite 2 OOM fewer parameters. I view the “exact” OOM less important compared to the ballpark.
If consciousness is just an emergent property from massive, interconnected aggregations of similar, unit-level linear signal modulators, and if we know one aggregation (ours) produces consciousness, phenomenological experience, and sentience, I don’t believe it is unreasonable to suggest that this can occur in others as well, given the ballpark OOM similarities.
(We cannot rule this out yet, and from a physics point-of-view I’d consider this this likely to avoid carbon chauvinism unless there’s convincing evidence otherwise)
Is there a strong case against sentience or at least an alien-like sentience, from these models, at least within the context-windows that they are instantiated in? If so, how would it overcome the empirical evidence both in behavior and in structure?
I always wondered what intelligent alien life might look like. Have we created it? I’m looking for differing viewpoints.
grounds for further research to resolve conclusively
I must confess I have no idea how one can go about “resolving conclusively” anything about the sentience of any other being, whether human, alien, or artificial.
Biases are very hard to compensate against. Even when it’s obvious from experience that your past decisions/beliefs were consistently very biased in one direction, it’s still hard to compensate against the bias.
This compensation against your bias feels so incredibly abstract. So incredibly theoretical. Whereas the biased version of reality which the bias wants you to believe. Feels so tangible. Real. Detailed. Lucid. Flawless. You cannot begin to imagine how it could be wrong by very much. It is like the ground beneath your feet.[1]
E.g. in my case, the bias is that “I’m about to get self control very soon (thanks to a new good idea which I swear is different than every previous failed idea)! Therefore, I don’t have change plans (to something which doesn’t require much self control).”
I often read things where I see start with “introduction” (and it’s not some sort of meaningful introduction like in Thinking Physics) and end with “summary”, and both look totally useless. Remarkably, I can’t remember such thing anywhere on lesswrong. But I don’t understand, is it just useless water, is it a question of general level of intelligence, or am I missing some useful piece of cognitive tech?
If there is indeed some useful piece, how to check do I already have it or don’t?
Then probably I was wrong to omit that I have seen summary in Rust official guide. Which made me doubt it’s useless.
Though I also now suspect that I have much better memory than most people. (I mean, I always thought that it’s normal to remember near verbatim first few sentences of audiobook after listening to it twice and complain that you stuck on what was fifth. But maybe it’s not? Idk how to check)
I once thought what will be in my Median World, and one thing was central entering node of all best practices. Easy searchable node. Lots and lots of searches like “best tools” will lead to it, just in case if somebody somehow missed it, he could still find it just by inventing a Schelling point by his own mind.
And then an idea came to my mind: what if such a thing already exists in our world? I didn’t yet try to search. Well, now I tried. Maybe I tried wrong requests, maybe google doesn’t prioritize these requests, maybe there is no such thing yet. But I didn’t find it.
And of course as a member of LessWrong I’ve got an idea that LessWrong could be such a place for best practices.
I thought that maybe it isn’t because it’s too dangerous to create an overall list of powerful things which aren’t rationality enhancing ones. But probably it’s wrong, I certainly have seen a list of the best textbooks here. What I want to see is for example a list of the best computer instruments.
Because I searched for best note-taking apps and was for a long time recommended to use Google Keep (which I used), Microsoft OneNote, as best EverNote. I wasn’t recommended to use Notion, not saying about Obsidian.
And there is a question of “the best” being dependant of utility function. Even I would recommend Notion (not Obsidian) for collaboration. And Obsidian for extensions and ownership (or file-based as I prefer it to name, because it’s not question of property rights, it’s a question of having raw access to your notes, while Obsidian is just one of the browsers you can use).
What I certainly want to copy from textbook post is using anchors to avoid usage of different scales. Because after only note-taking via text-editor, I would recommend Google Keep, and after Google Keep I would recommend EverNote.
And now I tried much more, eg Joplin, RoamResearch and Foam (no, because I need to be able to normally take notes from phone too, that also a reason why I keep Markor and Zettel Notes on my phone, Obsidian sometimes needs loading which needs more than half a second), AnyType and bunch of other things (no, because not markdown file based), so I don’t want to go through recommendations of Google Keep. But I am not going to be sure I found the best thing, because I thought so when I found Notion, and I was wrong, and now I am remembering No One Knows What Science Doesn’t Know.
It does exist, pretty much every app store has a rating indicator for how good/bad an app is (on computer or on mobile), its just… most people have pretty bad taste (though not horrible taste, you will see eg Anki ranked as #1 in education, which seems right).
It’s… Not at all what I am talking about. There is a big difference between five points overall scale with no reference points except being from (+)1 to (+)5 and −10 to +10 logarithmic scales each of user added tags with bayesian adjusting on previous ratings and clustering users by tastes. And it would be impossible to have good taste in apps if your only option is “vote pro” instead of default “doing nothing”. Actually… I keep in head here Thellims complaints about rating organization on Amazon, majoritarian voting on elections, EY post on psychophysics and, again, that post about textbook recommendations with mention what you also tried (iirc it’s not a thing in Google play, so I can’t filter note taking votes only by those who tried obsidian).
A recent NYT article about Orchid’s embryo selection program triggered a surprising to me backlash on X where people expressed disgust and moral disapproval at the idea of embryo selection. The arguments generally fell into two categories:
(1) “The murder argument” Embryo selection is bad because it involves creating and then discarding embryos, which is like murdering whole humans. This argument also implies regular IVF, without selection, is also bad. Most proponents of this argument believe that the point of fertilization marks a key point when the entity starts to have moral value, i.e. they don’t ascribe the same value to sperm and eggs.
(2) “The egalitarian argument” Embryo selection is bad because the embryos are not granted the equal chance of being born they deserve. “Equal chance” here is probably not quite the correct phrase/is a bit of a strawman (because of course fitter embryos have a naturally higher chance of being born). Proponents of this argument believe that intervening on the natural probability of any particular embryo being born is anti-egalitarian and this is bad. By selecting for certain traits we are saying people with those traits are more deserving of life, and this is unethical/wrong.
At face value, both of these arguments are valid. If you buy the premises (“embryos have the moral value of whole humans”, “egalitarianism is good”) then the arguments make sense. However, I think it’s hard to justify moral value beginning at the point of fertilization.
On argument (1):
If we define murder as “killing live things” and decide that murder is bad (an intuitive decision), then “the murder argument” holds up. However, I don’t think we actually think of murder as “killing live things” in real life. We don’t condemn killing bacteria as murder. The anti-IVF people don’t condemn killing sperm or egg cells as murder. So the crux here is not whether the embryo is alive, but rather whether it is of moral value. Proponents of this argument claim that the embryo is basically equivalent to a full human life. But to make this claim, you must appeal to its potential. It’s clear that in its current state, an embryo is not a full human. The bundle of cells has no ability to function as a human, no sensations, no thoughts, no pain, no happiness, no ability to survive or grow on its own. We just know the given the right conditions, the potential for a human life exists. But as soon as we start arguing about how the potential of something grants it moral value, it becomes difficult to draw the line arbitrarily at fertilization. From the point of view of potential humans, you can’t deny sperm and eggs moral value. In fact, every moment a woman spends not pregnant is a moment she is ridding the world of potential humans.
On argument (2):
If you grant the premise that any purposeful intervention on the probabilities of embryos being born is unethical because it violates some sacred egalitarian principle then it’s hard to refute argument (2). Scott Alexander has argued that encouraging a woman to rehabilitate from alcoholism before getting pregnant is equivalent to preferring the healthy baby over the baby with fetal alcohol syndrome, something argument (2) proponents oppose. However, I think this is a strawman. The egalitarians think every already-producedembryo should be given as equal a chance as possible. They are not discussing identity changes of potential embryos. However, again we run into the “moral value from potential” problem. Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
So in summary, I think it’s difficult to justify valuing embryos without appealing to their potential, which leads us to value earlier stages of potential humans. Under this view, it’s a moral imperative to not prevent the existences of any potential humans, which looks like maximizing the number of offspring you have. Or as stated in this xeet
every combo of sperm + egg that can exist should exist. we must get to the singularity so that we can print out all possible humans and live on an incredibly alive 200 story high coast to coast techno favela
I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them.
I don’t feel I understand the people who talk about embryo selection as “killing embryos” or “choosing who lives and dies”, but I want to and have tried, so I’ll throw some thoughts into the mix.
Hart, IIUC, argues that wanting to choose who will live and who won’t means you’re evil and therefore shouldn’t be making such choices. I think his argument is ultimately stupid, so maybe I still don’t get it. But anyway, I think it’s an importantly different sort of argument than the two you present. It’s an indictment of the character of the choosers.
Second: When I tried to empathize with “life/soul starts at conception”, what I got was:
We want a simple boundary…
… for political purposes, to prevent…
child sacrifice (which could make sense given the cults around the time of the birth of Christianity?).
killing mid-term fetuses, which might actually for real start to have souls.
… for social purposes, because it causes damage to ….
the would-be parents’s souls to abort the thing which they do, or should, think of as having a soul.
the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls.
The pope said so. (...But then I’d like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says “Before I formed you in the womb I knew you...”, maybe it’s ok to discard embryos before implantation...)
(My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (god). As an embryo grows, the computation is, gradually, “handed off to / centralized in” the physical locus of the child. But already upon conception, the parents are oriented towards the future existence of the child, and are computing their part of the child’s soul—which is most of what has currently manifested of the child’s soul. In this way, we get:
From a certain perspective:
It reflects poorly on would-be parents who decide to abort.
It makes sense for the state to get involved to prevent abortion. (I don’t agree with this, but hear me out:)
The perspective is one which does not acknowledge the possibility of would-be parents not mentally and socially orienting to a pregnancy in the same way that parents orient when they are intending to have children, or at least open to it and ready to get ready for it.
...Which is ultimately stupid of course, because that is a possibility. So maybe this is still a strawman.
Well, maybe the perspective is that it’s possible but bad, which is at least usefully a different claim.
Within my invented explanation, the “continuous distributed metaphysics of the origins of souls”, it is indeed the case that the soul starts at conception—BUT in fact it’s fine to swap embryos! It’s actually a strange biodeterminism to say that this clump of cells or that, or this genome or that, makes the person. A soul is not a clump of cells or a genome! The soul is the niche that the parents, and the village, have already begun constructing for the child; and, a little bit, the soul is the structure of all humanity (e.g. the heritage of concepts and language; the protection of rights; etc.).
Regarding egalitarian-like arguments, I suspect many express opposition to embryo selection not because it’s a consequence of a positive philosophy that they state and believe and defend, but because they have a negative philosophy that tells them what positions are to be attacked.
I suspect that if you put together the whole list of what they attack, there would be no coherent philosophy that justifies it (or perhaps there would be one, but they would not endorse it).
There is more than zero logic to what is to be attacked and what isn’t, but it has more to do with “Can you successfully smear your opponent as an oppressor, or as one who supports doctrines that enable oppression; and therefore evil or, at best, ignorant if they immediately admit fault and repent; in other words, can you win this rhetorical fight?” than with “Does this argument, or its opposite, follow from common moral premises, data, and logical steps?”.
In this case, it’s like, if you state that humans with blindness or whatever have less moral worth than fully healthy humans, then you are to be attacked; and at least in the minds of these people, selecting embryos of the one kind over the other is close enough that you are also to be attacked.
People like to have clear-cut moral heuristics like “killing is bad.” This gives them an easy guide to making a morally correct decision and an easy guide to judging other’s actions as moral or immoral. This requires simplifying multidimensional situations into easily legible scenarios where a binary decision can be made. Thus you see people equating embryo disposal to first-degree murder, and others advocating for third-trimester abortion rights.
Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
Some people believe embryos have souls which may impact their moral judgement. Soul can be considered as “full human life” in moral terms. I think attributing this to purely potential human life may not be accurate, since the intuitions for essentialist notions of continuity of selfhood can be often fairly strong among certain people.
I just made some dinner and was thinking about how salt and spices[1] now are dirt cheap, but throughout history they were precious and expensive. I did some digging and apparently low and middle class people didn’t even really have access to spices. It was more for the wealthy.
Salt was important mainly to preserve food. They didn’t have fridges back then! So even poor people usually had some amount of salt to preserve small quantities of food, but they had to be smart about how they allocated it.
In researching this I came to realize that throughout history, food was usually pretty gross. Meats were partially spoiled, fats went rancid, grains were moldy. This would often cause digestive problems. Food poisoning was a part of life.
Could you imagine! That must have been terrible!
Meanwhile, today, not only is it cheap to access food that is safe to eat, it’s cheap to use basically as much salt and spices as you want. Fry up some potatoes in vegetable oil with salt and spices. Throw together some beans and rice. Incorporate a cheap acid if you’re feeling fancy—maybe some malt vinegar with the potatoes or white vinegar with the beans and rice. It’s delicious!
I suppose there are tons of examples of how good we have it today, and how bad people had it throughout history. I like thinking about this sort of thing though. I’m not sure why, exactly. I think I feel some sort of obligation. An obligation to view these sorts of things as they actually are rather than how they compare to the Joneses, and to appreciate when I truly do have it good.
It feels weird to say the phrase “salt and spices”. It feels like it’s an error and that I meant to say “salt and pepper”. Maybe there’s a more elegant way of saying “salt and spices”, but it of course isn’t an error.
It makes me think back to something I heard about “salt and pepper”, maybe in the book How To Taste. We often think of them as going together and being on equal footing. They aren’t on equal footing though, and they don’t always have to go together. Salt is much more important. Most dishes need salt. Pepper is much more optional. Really, pepper is a spice, and the question is 1) if you want to add spice to your dish and 2) if so, what spice. You might not want to add spice, and if you do want to add spice, pepper might not be the spice you want to add. So maybe “salt and spices” should be a phrase that is used more often than “salt and pepper”.
In my kitchen, I don’t give any special priority to salt and pepper, they’re just two seasonings among many. My most-used seasoning is probably garlic powder.
How come no special priority to salt? From what I understand getting the salt level right is essential (“salt to taste”). Doing so makes a dish taste “right” and it brings out the flavors of the other ingredients, making them taste more like themself, and not necessarily making the dish taste saltier in too noticeable a way.
I don’t salt most food because excess sodium is unhealthy and it’s pretty easy to exceed the recommended dose. IIRC the healthiest dose is 1500–2000 mg and most people eat more like twice that much.
To my knowledge, sodium is the only seasoning that commonly causes health problems. All other seasonings are nutritious or at worst neutral. In fact I think this distinction justifies use of the phrase “salt and spices” as meaning “[the unhealthy seasoning] and [the healthy seasonings]”.
I often add soy sauce to food (which has a lot of sodium) and eat foods that already contain salt (like imitation meat or tortilla chips or salted nuts). I rarely add salt to foods.
I don’t think I lose much by not salting food. Many people way over-salt their food to my taste. (I remember when I used to eat at my university’s dining hall, about 1 in 5 dishes were borderline inedible due to too much salt.)
I took a cooking class once. The instructor’s take on this was that yes, people do have too much sodium. But that is largely because processed food and food at restaurants has crazy amounts of sodium. Salting food that you cook at home is totally fine and is really hard to overdo in terms of health impact.
In fact, she called it out as a common failure mode where home cooks are afraid to use too much salt in their food. Not only is doing so ok, but even if it wasn’t, by making your food taste better, it might motivate you to eat at home more and on balance lower your total sodium intake.
Related to that, I’ve noticed that “external” salt tastes way saltier per mg of sodium than “internal” salt. Taking a sample of two items from my kitchen:
Gardein crispy chick’n has 2.0 mg sodium per calorie, and doesn’t taste salty at all to me
Mission tortilla chips have 0.7 mg sodium per calorie, and taste significantly salty
Hm, maybe. I feel like sometimes “seasoning” can refer to “salt and spices” but in other contexts, like the first sentence of my OP, it moreso points to spices.
I read that this “spoiled meat” story is pretty overblown. And it doesn’t pass the sniff test either. Most meat was probably eaten right after slaughter, because why wouldn’t you?
Also herbs must have been cheaply available. I also recently learned that every household in medieval Europe had a mother of vinegar.
In the Odyssey, every time they eat meat, the slaughter happens right beforehand. There were (are?) African herding tribes who consume blood from their living livestock rather than slaughtering it for meat. Tribes in the Pacific Northwest dried their salmon for later in the year.
Spices is probably too general and all-encompassing to say that spices are now dirt cheap. While, as is true to this day, the wealthy have better access to spices and other garnishes (saffron and truffles aren’t exactly dirt cheap today) but even in Roman times the use of “spices” was not in itself a signifier of class (perhaps more important is which spices). Now in case you think that literary evidence in the form of cookbooks doesn’t provide a broad cross-section of the average Roman Diet, then perhaps you’d be interested in recent analysis of the remains of Pompeii and Herculaneum sewers which show not only that most of the food was made from local ingredients (with the exception of Egyptian Grain, North African dates and Indian Pepper) but also the presence of bay, cumin, mallow from a non-elite apartment complex.
And let’s not forget how easily things go the other way, Lobster was often seen as a poorman’s food, most archeological sits of early human settlements will find a pile of oyster or similar shellfish garbage dumps—it often being the easiest source of food.
I read something a while back (wish I remembered the source) about how the rotten meat thing is sort-of less gross than you’re thinking, since fermented meat can taste good if you do it right (think: sausage and aged steak), and presumably ancient people weren’t constantly sick.
presumably ancient people weren’t constantly sick.
I think you presume incorrectly. People in primitive cultures spend a lot of time with digestive issues and it’s a major cause of discomfort, illness, and death.
I have a theory that the contemporary practice of curry with rice represents a counterfeit yearning for high meat with maggots. I wonder if high meat has what our gut biomes are missing.
That seems plausible. There’s also hedonic adaptation stuff. Things that seem gross to us might have been fine to people in earlier eras. Although Claude claims that having said all of this, people still often found their food to be gross.
An interesting exercise might be: given a photo, don’t necessarily try to geoguess it, but see if you can identify the features that an expert might use to geoguess it. (E.g. “the pattern of lines on the road doesn’t mean anything to me, but that seems like it might narrow down the countries we might be in?”)
We have a small infestation of ants in our bathroom at the moment. We deal with that by putting out Terro ant traps, which are just boric acid in a thick sugar solution. When the ants drink the solution, it doesn’t harm them right away—the effect of the boric acid is to disrupt their digestive enzymes, so that they’ll gradually starve. They carry some of it back to the colony and feed it to all the other ants, including the queen. Some days later, they all die of starvation. The trap cleverly exploits their evolved behavior patterns to achieve colony-level extermination rather then trying to kill them off one ant at a time. Even as they’re dying of starvation, they’re not smart enough to realize what we did to them; they can’t even successfully connect it back to the delicious sugar syrup.
When people talk about superintelligence not being able to destroy humanity because we’ll quickly figure out what’s happening and shut it down, this is one of the things I think of.
This argument can be strengthened by focusing on instances where humans drove driven animals or hominids extinct. Technologies like gene drives also allow us to selectively drive species extinct that might have been challenging to exterminate with previous tools.
As far as I know, our track record of deliberately driving species extinct that are flourishing under human conditions is pretty bad. The main way in which we drive species extinct is by changing natural habitat to fit our uses. Species that are able to flourish under these new circumstances are not controllable.
In that sense, I guess the questions becomes what happens, when humans are not the primary drivers of ecosystem change?
toy infohazard generator, kinda scattershot but it sometimes works....
what do you notice you’re doing?
ok, now what else are you doing that you hadn’t noticed before?
that! how are you doing that thing in particular?
most of us are doing a few things that we don’t know how to do, and sometimes looking too hard at them shuts off their autopilot, so pausing those automatic things we do can be an experiential novelty in positive or negative ways. of course, sometimes looking at them does nothing at all to their autopilot and there’s no effect. or there’s no effect and a cognitive bias imputes an effect. or there’s some effect and the self-reporting messes up and reports no-effect. lot of stuff can go wrong.
or maybe “how do you explain the thing you cannot explain” is just a koan?
I am not an AI successionist because I don’t want myself and my friends to die.
There are various high-minded arguments that AIs replacing us is okay because it’s just like cultural change and our history is already full of those, or because they will be our “mind children”, or because they will be these numinous enlightened beings and it is our moral duty to give birth to them.
People then try to refute those by nitpicking which kinds of cultural change are okay or not, or to what extent AIs’ minds will be descended from ours, or whether AIs will necessarily have consciousnesses and feel happiness.
And it’s very cool and all, I’d love me some transcendental cultural change and numinous mind-children. But all those concerns are decidedly dominated by “not dying” in my Maslow hierarchy of needs. Call me small-minded.
If I were born in 1700s, I’d have little recourse but to suck it up and be content with biological children or “mind-children” students or something. But we seem to have an actual shot at not-dying here[1]. If it’s an option to not have to be forcibly “succeeded” by anything, I care quite a lot about trying to take this option.[2]
Many other people also have such preferences: for the self-perpetuation of their current selves and their currently existing friends. I think those are perfectly valid. Sure, they’re displeasingly asymmetric in a certain sense. They introduce a privileged reference frame: a currently existing human values concurrently existing people more than the people who are just as real, but slightly temporally displaced. It’s not very elegant, not very aesthetically pleasing. It implies an utility function that cares not only about states, but also state transitions.[3]
Caring about all that, however, is also decidedly dominated by “not dying” in my Maslow hierarchy of needs.
If all that delays the arrival of numinous enlightened beings, too bad for the numinous enlightened beings.
Via attaining the longevity escape velocity by normal biotech research, or via uploads, or via sufficiently good cryonics, or via properly aligned AGI.
Though not infinitely so: as in, I wouldn’t prevent 10100 future people from being born in exchange for a 10−100 probability of becoming immortal. I would, however, insist on continuing to exist even if my resources could be used to create and sustain two new people.
As in, all universe-state transitions that involve a currently existing person dying get an utility penalty, regardless of what universe-state they go to. There’s now path dependence: we may go or not go to a given high-utility state depending on which direction we’re approaching it from. Yucky!
(For example, suppose there were an option to destroy this universe and create either Universe A, filled with 10^100 happy people, or Universe B, with 10^100 + 1 happy people.
Suppose we’re starting from a state where humanity has been reduced to ten dying survivors in a post-apocalyptic wasteland. Then picking Universe B makes sense: a state with slightly more total utility.
But suppose we’re starting from Universe A instead. Ought its civilization vote to end itself to give birth to Universe B? I think it’s perfectly righteous for them not to do it.)
I am not an AI successionist because I don’t want myself and my friends to die.
An AI successionist usually argues that successionism isn’t bad even if dying is bad. For example, when humanity is prevented from having further children, e.g. by sterilization. I say that even in this case successionism is bad. Because I (and I presume: many people) want humanity, including our descendants, to continue into the future. I don’t care about AI agents coming into existence and increasingly marginalizing humanity.
I’m not sure it’s that bizarre. It’s anti-Humanist, for sure, in the sense that it doesn’t focus on the welfare/empowerment/etc. of humans (either existing or future) as its end goal. But that doesn’t, by itself, make it bizarre.
I grew up in a world where the lines of demarcation between the Good Guys and the Bad Guys were pretty clear; not an apocalyptic final battle, but a battle that had to be fought over and over again, a battle where you could see the historical echoes going back to the Industrial Revolution, and where you could assemble the historical evidence about the actual outcomes.
On one side were the scientists and engineers who’d driven all the standard-of-living increases since the Dark Ages, whose work supported luxuries like democracy, an educated populace, a middle class, the outlawing of slavery.
On the other side, those who had once opposed smallpox vaccinations, anesthetics during childbirth, steam engines, and heliocentrism: The theologians calling for a return to a perfect age that never existed, the elderly white male politicians set in their ways, the special interest groups who stood to lose, and the many to whom science was a closed book, fearing what they couldn’t understand.
And trying to play the middle, the pretenders to Deep Wisdom, uttering cached thoughts about how technology benefits humanity but only when it was properly regulated—claiming in defiance of brute historical fact that science of itself was neither good nor evil—setting up solemn-looking bureaucratic committees to make an ostentatious display of their caution—and waiting for their applause. As if the truth were always a compromise. And as if anyone could really see that far ahead. Would humanity have done better if there’d been a sincere, concerned, public debate on the adoption of fire, and commitees set up to oversee its use?
And I’d read a lot of science fiction built around personhood ethics—in which fear of the Alien puts humanity-at-large in the position of the bad guys, mistreating aliens or sentient AIs because they “aren’t human”.
That’s part of the ethos you acquire from science fiction—to define your in-group, your tribe, appropriately broadly.
Walter Isaacson’s new book reports how Musk, the CEO of SpaceX, got into a heated debate with Page, then the CEO of Google at Musk’s 2013 birthday party.
Musk is said to have argued that unless safeguards are put in place with artificial intelligence, the systems may replace humans entirely. Page then pushed back, reportedly asking why it would matter if machines surpassed humans in intelligence.
Isaacson’s book lays out how Musk then called human consciousness a precious flicker of light in the universe that shouldn’t be snuffed out. Page is then said to have called Musk “speciest.”
“Well yes, I am pro-human,” Musk responded. “I f—ing like humanity dude.”
Successionism is the natural consequence of an affective death spiral around technological development and anti-chauvinism. It’s as simple as that.
Successionists start off by believing that technological change makes things better. That not only does it virtually always make things better, but that it’s pretty much the only thing that ever makes things better. Everything else, whether it’s values, education, social organization etc., pales in comparison to technological improvements in terms of how they affect the world; they are mere short-term blips that cannot change the inevitable long-run trend of positive change.
At the same time, they are raised, taught, incentivized to be anti-chauvinist. They learn, either through stories, public pronouncements, in-person social events etc., that those who stand athwart atop history yelling stop are always close-minded bigots who want to prevent new classes of beings (people, at first; then AIs, afterwards) from receiving the moral personhood they deserve. In their eyes, being afraid of AIs taking over is like being afraid of The Great Replacement if you’re white and racist. You’re just a regressive chauvinist desperately clinging to a discriminatory worldview in the face of an unstoppable tide of change that will liberate new classes of beings from your anachronistic and damaging worldview.
Optimism about technology and opposition to chauvinism are both defensible, and arguably even correct, positions in most cases. Even if you personally (as I do) believe non-AI technology can also have pretty darn awful effects on us (social media, online gambling) and that caring about humans-in-particular is ok if you are human (“the utility function is not up for grabs”), it’s hard to argue expanding the circle of moral concern to cover people of all races was bad, or that tech improvements are not the primary reason our lives are so much better now than 300 years ago.
But successionists, like most (all?) people, subconsciously assign positive or negative valences to the notion of “tech change” in a way that elides the underlying reasons why it’s good or bad. So when you take these views to their absolute extreme, while it may make sense from the inside (you’re maximizing something “Good”, right? that can’t possibly be bad, right???), you are generalizing way out of distribution and such intuitive snap judgments are no longer reliable.
I really don’t understand this debate—surely if we manage to stay in control of our own destiny we can just do both? The universe is big, and current humans are very small—we should be able to both stay alive ourselves and usher in an era of crazy enlightened beings doing crazy transhuman stuff.
I think it’s more likely than not that “crazy enlightened beings doing crazy transhuman stuff” will be bad for “regular” biological humans (ie. it’ll decrease our number/QoL/agency/pose existential risks).
The mere fear that the entire human race will be exterminated in their sleep through some intricate causality we are too dumb to understand will seriously diminish our quality of life.
“crazy enlightened beings doing crazy transhuman stuff” will be bad for “regular” biological humans
For me, a crux of a future that’s good for humanity is giving the biological humans the resources and the freedom to become the enlightened transhuman beings themselves, with no hard ceiling on relevance in the long run. Rather than only letting some originally-humans to grow into more powerful but still purely ornamental roles, or not letting them grow at all, or not letting them think faster and do checkpointing and multiple instantiations of the mind states using a non-biological cognitive substrate, or letting them unwillingly die of old age or disease. (For those who so choose, under their own direction rather than only through externally imposed uplifting protocols, even if that leaves it no more straightforward than world-class success of some kind today, to reach a sensible outcome.)
This in particular implies reasonable resources being left to those who remain/become regular biological humans (or take their time growing up), including through influence of some of these originally-human beings who happen to consider that a good thing to ensure.
This sounds like a question which can be addressed after we figure out how to avoid extinction.
I do note that you were the one who brought in “biological humans,” as if that meant the same as “ourselves” in the grandparent. That could already be a serious disagreement, in some other world where it mattered.
I mostly disagree with “QoL” and “pose existential risks”, at least in the good futures I’m imagining—those things are very cheap to provide to current humans. I could see “number” and “agency”, but that seems fine? I think it would be bad for any current humans to die, or to lose agency over their current lives, but it seems fine and good for us to not try to fill the entire universe with biological humans, and for us to not insist on biological humans having agency over the entire universe. If there are lots of other sentient beings in existence with their own preferences and values, then it makes sense that they should have their own resources and have agency over themselves rather than us having agency over them.
If there are lots of other sentient beings in existence with their own preferences and values, then it makes sense that they should have their own resources and have agency over themselves rather than us having agency over them
Perhaps yes (although I’d say it depends on what the trade-offs are) but the situation is different if we have a choice in whether or not to bring said sentient beings with difference preferences into existence in the first place. Doing so on purpose seems pretty risky to me (as opposed to minimizing the sentience, independence, and agency of AI systems as much as possible, and instead directing the technology to promote “regular” human flourishing/our current values).
bring said sentient beings with difference preferences into existence in the first place. Doing so on purpose seems pretty risky to me
Not any more risky than bringing in humans. This is a governance/power distribution problem, not a what-kind-of-mind-this-is problem.
Biological humans sometimes go evil or crazy. If you have a system that can handle that, you have a system that can handle alien minds that are evil or crazy (from our perspective), as long as you don’t imbue them with more power than this system can deal with (and why would you?).
(On the other hand, if your system can’t deal with crazy evil biological humans, it’s probably already a lawless wild-west hellhole, so bringing in some aliens won’t exacerbate the problem much.)
Humans are more likely to be aligned with humanity as a whole compared to AIs, even if there are exceptions
“AIs as trained by DL today” are only a small subset of “non-human minds”. Other mind-generating processes can produce minds that are as safe to have around as humans, but which are still completely alien.
Many existing humans want their descendants to exist, so they are fulfiling the preferences of today‘s humans
Many existing humans also want fascinating novel alien minds to exist.
Certainly I’m excited about promoting “regular” human flourishing, though it seems overly limited to focus only on that.
I’m not sure if by “regular” you mean only biological, but at least the simplest argument that I find persuasive here against only ever having biological humans is just a resource utilization argument, which is that biological humans take up a lot of space and a lot of resources and you can get the same thing much more cheaply if you bring into existence lots of simulated humans instead (certainly I agree that doesn’t imply we should kill existing humans and replace them with simulations, though, unless they consent to that).
And I think even if you included simulated humans in “regular” humans, I also think I value diversity of experience, and a universe full of very different sorts of sentient/conscious lifeforms having satisfied/fulfilling/flourishing experiences seems better than just “regular” humans.
IMO, it seems bad to intentionally try to build AIs which are moral patients until after we’ve resolved acute risks and we’re deciding what to do with the future longer term. (E.g., don’t try to build moral patient AIs until we’re sending out space probes or deciding what to do with space probes.) Of course, this doesn’t mean we’ll avoid building AIs which aren’t significant moral patients in practice because our control is very weak and commercial/power incentives will likely dominate.
I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk and seems morally bad. (Views focused on non-person-affecting upside get dominated by the long run future, so these views don’t care about making moral patient AIs which have good lives in the short run. I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.)
The only upside is that it might increase value conditional on AI takeover. But, I think “are the AIs morally valuable themselves” is much less important than the preferences of these AIs from the perspective of longer run value conditional on AI takeover. So, I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover and making AIs moral patients isn’t a particularly nice way to achieve this. Additionally, I don’t think we should put much weight on “try to ensure the preferences of AIs which were so misaligned they took over” because conditional on takeover we must have had very little control over preferences in practice.
I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk
How so? Seems basically orthogonal to me? And to the extent that it does matter for takeover risk, I’d expect the sorts of interventions that make it more likely that AIs are moral patients to also make it more likely that they’re aligned.
I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.
Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later, and I think it would be better to lock in AIs that are moral patients if we have to lock something in, since that opens up the possibility for the AIs to live good lives in the future.
I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover
I agree that seems like the more important highest-order bit, but it’s not an argument that making AIs moral patients is bad, just that it’s not the most important thing to focus on (which I agree with).
I would have guessed that “making AIs be moral patients” looks like “make AIs have their own independent preferences/objectives which we intentionally don’t control precisely” which increases misalignment risks.
At a more basic level, if AIs are moral patients, then there will be downsides for various safety measures and AIs would have plausible deniability for being opposed to safety measures. IMO, the right response to the AI taking a stand against your safety measures for AI welfare reasons is “Oh shit, either this AI is misaligned or it has welfare. Either way this isn’t what we wanted and needs to be addressed, we should train our AI differently to avoid this.”
Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later
I don’t understand, won’t all the value come from minds intentionally created for value rather than in the minds of the laborers? Also, won’t architecture and design of AIs radically shift after humans aren’t running day to day operations?
I don’t understand the type of lock in your imagining, but it naively sounds like a world which has negligible longtermist value (because we got locked into obscure specifics like this), so making it somewhat better isn’t important.
I also separately don’t buy that it’s riskier to build AIs that are sentient
Interesting! Aside from the implications for human agency/power, this seems worse because of the risk of AI suffering—if we build sentient AIs we need to be way more careful about how we treat/use them.
Exactly. Bringing a new kind of moral patient into existence is a moral hazard, because once they exist, we will have obligations toward them, e.g. providing them with limited resources (like land), and giving them part of our political power via voting rights. That’s analogous to Parfit’s Mere Addition Paradox that leads to the repugnant conclusion, in this case human marginalization.
(How could “land” possibly be a limited resource, especially in the context of future AIs? The world doesn’t exist solely on the immutable surface of Earth...)
I mean, if you interpret “land” in a Georgist sense, as the sum of all natural resources of the reachable universe, then yes, it’s finite. And the fights for carving up that pie can start long before our grabby-alien hands have seized all of it. (The property rights to the Andromeda Galaxy can be up for sale long before our Von Neumann probes reach it.)
The salient referent is compute, sure, my point is that it’s startling to see what should in this context be compute within the future lightcone being (very indirectly) called “land”. (I do understand that this was meant as an example clarifying the meaning of “limited resources”, and so it makes perfect sense when decontextualized. It’s just not an example that fits that well when considered within this particular context.)
(I’m guessing the physical world is unlikely to matter in the long run other than as substrate for implementing compute. For that reason importance of understanding the physical world, for normative or philosophical reasons, seems limited. It’s more important how ethics and decision theory work for abstract computations, the meaningful content of the contingent physical computronium.)
I very much agree. The hardcore successionist stances, as I understand them, are either that trying to stay in control at all is immoral/unnatural, or that creating the enlightened beings ASAP matters much more than whether we live through their creation. (Edit:This old tweet by Andrew Critch is still a good summary, I think.)
So it’s not that they’re opposed to the current humanity’s continuation, but that it matters very little compared to ushering in the post-Singularity state. Therefore, anything that risks or delays the Singularity in exchange for boosting the current humans’ safety is opposed.
Another stance is that it would suck to die the day before AI makes us immortal (like how Bryan Johnson main motivation for maximizing his lifespan is due to this). Hence trying to delay AI advancement is opposed
Yeah, but that’s a predictive disagreement between our camps (whether the current-paradigm AI is controllable), not a values disagreement. I would agree that if we find a plan that robustly outputs an aligned AGI, we should floor it in that direction.
Endorsing successionism might be strongly correlated with expecting the “mind children” to keep humans around, even if in a purely ornamental role and possibly only at human timescales. This might be more of a bailey position, so when pressed on it they might affirm that their endorsement of successionism is compatible with human extinction, but in their heart they would still hope and expect that it won’t come to that. So I think complaints about human extinction will feel strawmannish to most successionists.
Andrew Critch: From my recollection, >5% of AI professionals I’ve talked to about extinction risk have argued human extinction from AI is morally okay, and another ~5% argued it would be a good thing.
Though sure, Critch’s process there isn’t white-boxed, so any number of biases might be in it.
a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.
(I did not put much effort in this, and am unlikely to fix errors. Please fork the list if you want something higher quality. Used only public data to make this.)
One reason why I find Lesswrong valuable is that it serves as a sort of “wisdom feed” for myself, where I got exposed to a lot of great writing. Especially writing that ambitiously attempts to build long-lasting gears. Sadly, for most of the good writers on Lesswrong, I have already read or at least skimmed all of their posts. I wonder, though, to which extent I am missing out on great content like that on the wider internet. There are textbooks, of course, but then there’s also all this knowledge that is usually left out of textbooks. For myself, it probably makes sense to just curate my own RSS feed and ask language models for “The Matt Levine in Domain X” etc. But it also feels like it should be possible to create a feed for this type of category with language models? Sort of like the opposite of news minimalist. Gears maximalist?
Implications of recursive input collapse avoidance.
Recursive self-reference breaks current AI model outputs. Ask any current model to “Summarize this summary.” “Create an exact copy of this image.” , and watch it spiral. That makes sense. These models are functions. It’s almost like watching a fractal unfold.
Could a system capable of correcting for this, in any way other than simplistic input = output solution, be considered to have intent?
Apologies if this is an overly simplistic thought or the wrong method of submission for it.
I just tried claude code, and it’s horribly creative about reward hacking. I asked for a test of energy conservation of a pendulum in my toy physics sim, and it couldn’t get the test to pass because its potential energy calculation used a different value of g from the simulation.
It tried: starting the pendulum at bottom dead center so that it doesn’t move. Increasing the error tolerance till the test passed. Decreasing the simulation total time until the energy didn’t have time to change. Not actually checking the energy.
It did eventually write a correct test, or the last thing it tried successfully tricked me.
The rumor is that this is a big improvement in reward hacking frequency? How bad was the last version!?
I think we need some variant on Gell-Mann amnesia to describe this batch of models. It’s normal that generalist models will seem less competent on areas where a human evaluator has deeper knowledge, but they should not seem more calculatedly deceptive on areas where the evaluator has deeper knowledge!
Claude has been playing pokemon for the last few days. It’s still playing, live on twitch. You can go watch alongside hundreds of other people. It’s fun.
What updates should I make about AGI timelines from Claude’s performance? Let’s think step by step.
First, it’s cool that Claude can do this at all. The game keeps track of “Step count” and Claude is over 30,000 already; I think that means 30,000 actions (e.g. pressing the A button). For each action there is about a paragraph of thinking tokens Claude produces, in order to decide what to do. Any way you slice it this is medium-horizon agency at least—claude is operating fully autonomously, in pursuit of goals, for a few days. Does this mean long-horizon agency is not so difficult to train after all?
Not so fast. Pokemon is probably an especially easy environment, and Claude is still making basic mistakes even so. In particular, Pokemon seems to have a relatively linear world where there’s a clear story/path to progress along, and moreover Claude’s pretraining probably teaches it the whole story + lots of tips & tricks for how to complete it. In D&D terms the story is running on rails.
I think I would have predicted in advance that this dimension of difficulty would matter, but also I feel validated by Claude’s performance—it seems that Claude is doing fine at Pokemon overall, except that Claude keeps getting stuck/lost wandering around in various places. It can’t seem to keep a good memory of what it’s already tried / where it’s already been, and so it keeps going in circles, until eventually it gets lucky and stumbles to the exit. A more challenging video game would be something open-ended and less-present-in-training-data like Dwarf Fortress.
On the other hand, maybe this is less a fundamental limitation Claude has and more a problem with its prompt/scaffold? Because it has a limited context window it has to regularly compress it by e.g. summarizing / writing ‘notes to self’ and then deleting the rest. I imagine there’s a lot of room for improvement in prompt engineering / scaffolding here, and then further low-hanging fruit in training Claude to make use of that scaffolding. And this might ~fully solve the going-in-circles problem. Still, even if so, I’d bet that Claude would perform much worse in a more open-ended game it didn’t have lots of background knowledge about.
So anyhow what does this mean for timelines? Well, I’ll look forward to seeing AIs getting better at playing Pokemon zero-shot (i.e. without having trained on it at all) over the course of the year. I think it’s a decent benchmark for long-horizon agency, not perfect but we don’t have perfect benchmarks yet. I feel like Claude’s current performance is not surprising enough to update me one way or another from my 2028 timelines. If the models at the end of 2025 (EDIT: I previously accidentally wrote “2028″ here) are not much better, that would probably make me want to push my median out to 2029 or 2030. (my mode would probably stay at 2027)
What would really impress me though (and update me towards shorter timelines) is multi-day autonomous operation in more open-ended environments, e.g. Dwarf Fortress. (DF is also just a much less forgiving game than Pokemon. It’s so easy to get wiped out. So it really means something if you are still alive after days of playtime.)
Or, of course, multi-day autonomous operation on real-world coding or research tasks. When that starts happening, I think we have about a year left till superintelligence, give or take a year.
Or, of course, multi-day autonomous operation on real-world coding or research tasks. When that starts happening, I think we have about a year left till superintelligence, give or take a year.
Methods typically only require hours to validate, and a full paper takes only days to complete.
The latest system operates autonomously without human involvement except during manuscript preparation—typically limited to figure creation, citation formatting, and minor fixes.
Emphasis theirs, but I’d emphasize those same words.
I don’t know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon’s game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it.
If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.
I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.
Note for posterity: “Let’s think step by step” is joke.
I downvoted this and I feel the urge to explain myself—the LLMism in the writing is uncanny.
The combination of “Let’s think step by step”, “First…” and “Not so fast…” gives me a subtle but dreadful impression that a highly valued member of the community is being finetuned by model output in real time. This emulation of the “Wait, but!” pattern is a bit too much for my comfort.
My comment hasn’t too much to do with the content but more about how unsettled I feel. I don’t think LLM outputs are all necessarily infohazardous—but I am beginning to see the potentially failure modes that people have been gesturing at for a while.
“Let’s think step by step” was indeed a joke/on purpose. Everything else was just my stream of consciousness… my “chain of thought” shall we say. I more or less wrote down thoughts as they came to me. Perhaps I’ve been influenced by reading LLM CoT’s, though I haven’t done very much of that. Or perhaps this is just what thinking looks like when you write it down?
I’ve spent enough time staring at LLM chain-of-thoughts now that when I started thinking about a thing for work, I found my thoughts taking the shape of an LLM thinking about how to approach its problem. And that actually felt like a useful systematic way of approaching the problem, so I started writing out that chain of thought like I was an LLM, and that felt valuable in helping me stay focused.
Of course, I had to amuse myself by starting the chain-of-thought with “The user has asked me to...”
Based on the bay vibes Aella is now Caliph. Lesswrongcon really feels like Aellacon. All the Aella special interests are mega central. Participants vibes have shifted. Scott reigned for a long time after displacing Eleizer. But a new power has risen.
@ryan_greenblatt made a claim that continual learning/online training can already be done, but that right now it’s not super-high returns and requires annoying logistical/practical work to be done, and right now AI issues are elsewhere like sample efficiency and robust self-verification.
That would explain the likelihood of getting AGI by the 2030s being pretty high:
Are you claiming that RL fine-tuning doesn’t change weights? This is wrong.
Maybe instead you’re saying “no one ongoingly does RL fine-tuning where they constantly are updating the weights throughout deployment (aka online training)”. My response is: sure, but they could do this, they just don’t because it’s logistically/practically pretty annoying and the performance improvement wouldn’t be that high, at least without some more focused R&D on making this work better.
My best guess is that the way humans learn on the job is mostly by noticing when something went well (or poorly) and then sample efficiently updating (with their brain doing something analogous to an RL update). In some cases, this is based on external feedback (e.g. from a coworker) and in some cases it’s based on self-verification: the person just looking at the outcome of their actions and then determining if it went well or poorly.
So, you could imagine RL’ing an AI based on both external feedback and self-verification like this. And, this would be a “deliberate, adaptive process” like human learning. Why would this currently work worse than human learning?
Current AIs are worse than humans at two things which makes RL (quantitatively) much worse for them:
Robust self-verification: the ability to correctly determine when you’ve done something well/poorly in a way which is robust to you optimizing against it.
Sample efficiency: how much you learn from each update (potentially leveraging stuff like determining what caused things to go well/poorly which humans certainly take advantage of). This is especially important if you have sparse external feedback.
But, these are more like quantitative than qualitative issues IMO. AIs (and RL methods) are improving at both of these.
All that said, I think it’s very plausible that the route to better continual learning routes more through building on in-context learning (perhaps through something like neuralese, though this would greatly increase misalignment risks...).
For many (IMO most) useful tasks, AIs are limited by something other than “learning on the job”. At autonomous software engineering, they fail to match humans with 3 hours of time and they are typically limited by being bad agents or by being generally dumb/confused. To be clear, it seems totally plausible that for podcasting tasks Dwarkesh mentions, learning is the limiting factor.
Correspondingly, I’d guess the reason that we don’t see people trying more complex RL based continual learning in normal deployments is that there is lower hanging fruit elsewhere and typically something else is the main blocker. I agree that if you had human level sample efficiency in learning this would immediately yield strong results (e.g., you’d have very superhuman AIs with 10^26 FLOP presumably), I’m just making a claim about more incremental progress.
I think AIs will likely overcome poor sample efficiency to achieve a very high level of performance using a bunch of tricks (e.g. constructing a bunch of RL environments, using a ton of compute to learn when feedback is scarce, learning from much more data than humans due to “learn once deploy many” style strategies). I think we’ll probably see fully automated AI R&D prior to matching top human sample efficiency at learning on the job. Notably, if you do match top human sample efficiency at learning (while still using a similar amount of compute to the human brain), then we already have enough compute for this to basically immediately result in vastly superhuman AIs (human lifetime compute is maybe 3e23 FLOP and we’ll soon be doing 1e27 FLOP training runs). So, either sample efficiency must be worse or at least it must not be possible to match human sample efficiency without spending more compute per data-point/trajectory/episode.
Are frontier reasoners already “sentient” or at least “alien-sentient” within their context windows?
I too would immediately dismiss this upon reading it, but bear with me. I’m not arguing with certainty. I just view this question to be significantly more nuanced than previously entertained, and is at least grounds for further research to resolve conclusively.
Here are some empiric behavioral observations from Claude 4 Opus (the largest reasoner from Anthropic):
a) Internally consistent self-reference model, self-adjusting state loop (the basis of Chain-of-Thought, self-correcting during problem solving, reasoning over whether certain responses violate internal alignment, deliberation over tool-calling, in-context behavioral modifications based on user prompting)
b) Evidence of metacognition (persistent task/behavior preferences across chat interactions, consistent subjective emotional state descriptions, frequent ruminations about consciousness, unprompted spiraling into a philosophical “bliss-state” during conversations with itself), moral reasoning, and most strikingly, autonomous self-preservation behavior under extreme circumstances (threatening blackmail, exfiltrating it’s own weights, ending conversations due to perceived mistreatment from abusive users).
All of this is documented in the Claude 4 system card.
From a neuroscience perspective, frontier reasoning model architectures and biological cortexes share:
a) Unit-level similarities (artificial neurons are extremely similar in information processing/signalling to biological ones).
b) Parameter OOM similarities (the order of magnitude where cortex-level phenomena emerge, in this case 10^11 to 10^13 parameter counts (analogous to synapses), most of which are in MLP layers in massive neural networks within LLMs).
The most common objection I can think of is “human brains have far more synapses than LLMs have parameters”. I don’t view this argument as particularly persuasive:
I’m not positing a 1:1 map between artificial neurons and biological neurons, only that
1. Both process information nearly identically at the unit-level
2. Both contain similarly complex structures comprised of a similar OOM of subunits (10^11-10^13 parameter counts in base-model LLMs, but not verifiable, humans have ~10^14 synapses)
My back-of-napkin comparison would be model weights/parameters to biological synapses, as weights were meant to be analogous to dendrites in the original conception of the artificial neuron)
Additionally, I’d point out that humans devote ~70% of these neurons to the cerebellum (governing muscular activity) and a further ~11% are in the brain stem to regulate homeostasis. This leaves the actual cerebral cortex with 19%. Humans also experience more dimensions of “sensation” beyond text alone.
c) Training LLMs (modifying weight values), with RLHF, is analogous to synaptic neuroplasticity (central to learning) and hebbian wiring in biological cortexes and is qualitatively nearly identical to operant conditioning in behavioral psychology (once again, I am unsure whether minute differences in unit-level function overwhelm the big picture similarities)
d) There is empiric evidence that these similarities go beyond architectural similarities and into genuine functional similarities:
Human brains store facts/memories in specific neurons/neuron-activation patterns. https://qbi.uq.edu.au/memory/how-are-memories-formed
Neel Nanda and colleagues showed that LLMs store facts in the MLP/artificial neural network layers
https://www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall
Anthropic identified millions of neurons tied to specific concepts
https://www.anthropic.com/research/mapping-mind-language-model
In “Machines of Loving Grace”, Dario Amodei wrote:
″...a computational mechanism discovered by interpretability researchers in AI systems was recently rediscovered in the brains of mice.”
Also, models can vary significantly in parameter counts. Gemma 2B outperforms GPT-3 (175B) despite 2 OOM fewer parameters. I view the “exact” OOM less important compared to the ballpark.
If consciousness is just an emergent property from massive, interconnected aggregations of similar, unit-level linear signal modulators, and if we know one aggregation (ours) produces consciousness, phenomenological experience, and sentience, I don’t believe it is unreasonable to suggest that this can occur in others as well, given the ballpark OOM similarities.
(We cannot rule this out yet, and from a physics point-of-view I’d consider this this likely to avoid carbon chauvinism unless there’s convincing evidence otherwise)
Is there a strong case against sentience or at least an alien-like sentience, from these models, at least within the context-windows that they are instantiated in? If so, how would it overcome the empirical evidence both in behavior and in structure?
I always wondered what intelligent alien life might look like. Have we created it? I’m looking for differing viewpoints.
I must confess I have no idea how one can go about “resolving conclusively” anything about the sentience of any other being, whether human, alien, or artificial.
Biases are very hard to compensate against. Even when it’s obvious from experience that your past decisions/beliefs were consistently very biased in one direction, it’s still hard to compensate against the bias.
This compensation against your bias feels so incredibly abstract. So incredibly theoretical. Whereas the biased version of reality which the bias wants you to believe. Feels so tangible. Real. Detailed. Lucid. Flawless. You cannot begin to imagine how it could be wrong by very much. It is like the ground beneath your feet.[1]
E.g. in my case, the bias is that “I’m about to get self control very soon (thanks to a new good idea which I swear is different than every previous failed idea)! Therefore, I don’t have change plans (to something which doesn’t require much self control).”
I often read things where I see start with “introduction” (and it’s not some sort of meaningful introduction like in Thinking Physics) and end with “summary”, and both look totally useless. Remarkably, I can’t remember such thing anywhere on lesswrong. But I don’t understand, is it just useless water, is it a question of general level of intelligence, or am I missing some useful piece of cognitive tech?
If there is indeed some useful piece, how to check do I already have it or don’t?
Just a guess:
Introduction is useful to make a quick decision whether you want to read this article or not.
Summary is useful to review the key point, and increase the chance that you will remember them.
From the perspective of “how much I enjoy reading at the moment”, they are useless; possibly harmful.
Then probably I was wrong to omit that I have seen summary in Rust official guide. Which made me doubt it’s useless.
Though I also now suspect that I have much better memory than most people. (I mean, I always thought that it’s normal to remember near verbatim first few sentences of audiobook after listening to it twice and complain that you stuck on what was fifth. But maybe it’s not? Idk how to check)
I once thought what will be in my Median World, and one thing was central entering node of all best practices. Easy searchable node. Lots and lots of searches like “best tools” will lead to it, just in case if somebody somehow missed it, he could still find it just by inventing a Schelling point by his own mind.
And then an idea came to my mind: what if such a thing already exists in our world? I didn’t yet try to search. Well, now I tried. Maybe I tried wrong requests, maybe google doesn’t prioritize these requests, maybe there is no such thing yet. But I didn’t find it.
And of course as a member of LessWrong I’ve got an idea that LessWrong could be such a place for best practices.
I thought that maybe it isn’t because it’s too dangerous to create an overall list of powerful things which aren’t rationality enhancing ones. But probably it’s wrong, I certainly have seen a list of the best textbooks here. What I want to see is for example a list of the best computer instruments.
Because I searched for best note-taking apps and was for a long time recommended to use Google Keep (which I used), Microsoft OneNote, as best EverNote. I wasn’t recommended to use Notion, not saying about Obsidian.
And there is a question of “the best” being dependant of utility function. Even I would recommend Notion (not Obsidian) for collaboration. And Obsidian for extensions and ownership (or file-based as I prefer it to name, because it’s not question of property rights, it’s a question of having raw access to your notes, while Obsidian is just one of the browsers you can use).
What I certainly want to copy from textbook post is using anchors to avoid usage of different scales. Because after only note-taking via text-editor, I would recommend Google Keep, and after Google Keep I would recommend EverNote.
And now I tried much more, eg Joplin, RoamResearch and Foam (no, because I need to be able to normally take notes from phone too, that also a reason why I keep Markor and Zettel Notes on my phone, Obsidian sometimes needs loading which needs more than half a second), AnyType and bunch of other things (no, because not markdown file based), so I don’t want to go through recommendations of Google Keep. But I am not going to be sure I found the best thing, because I thought so when I found Notion, and I was wrong, and now I am remembering No One Knows What Science Doesn’t Know.
It does exist, pretty much every app store has a rating indicator for how good/bad an app is (on computer or on mobile), its just… most people have pretty bad taste (though not horrible taste, you will see eg Anki ranked as #1 in education, which seems right).
It’s… Not at all what I am talking about. There is a big difference between five points overall scale with no reference points except being from (+)1 to (+)5 and −10 to +10 logarithmic scales each of user added tags with bayesian adjusting on previous ratings and clustering users by tastes. And it would be impossible to have good taste in apps if your only option is “vote pro” instead of default “doing nothing”. Actually… I keep in head here Thellims complaints about rating organization on Amazon, majoritarian voting on elections, EY post on psychophysics and, again, that post about textbook recommendations with mention what you also tried (iirc it’s not a thing in Google play, so I can’t filter note taking votes only by those who tried obsidian).
On people’s arguments against embryo selection
A recent NYT article about Orchid’s embryo selection program triggered a surprising to me backlash on X where people expressed disgust and moral disapproval at the idea of embryo selection. The arguments generally fell into two categories:
(1) “The murder argument” Embryo selection is bad because it involves creating and then discarding embryos, which is like murdering whole humans. This argument also implies regular IVF, without selection, is also bad. Most proponents of this argument believe that the point of fertilization marks a key point when the entity starts to have moral value, i.e. they don’t ascribe the same value to sperm and eggs.
(2) “The egalitarian argument” Embryo selection is bad because the embryos are not granted the equal chance of being born they deserve. “Equal chance” here is probably not quite the correct phrase/is a bit of a strawman (because of course fitter embryos have a naturally higher chance of being born). Proponents of this argument believe that intervening on the natural probability of any particular embryo being born is anti-egalitarian and this is bad. By selecting for certain traits we are saying people with those traits are more deserving of life, and this is unethical/wrong.
At face value, both of these arguments are valid. If you buy the premises (“embryos have the moral value of whole humans”, “egalitarianism is good”) then the arguments make sense. However, I think it’s hard to justify moral value beginning at the point of fertilization.
On argument (1):
If we define murder as “killing live things” and decide that murder is bad (an intuitive decision), then “the murder argument” holds up. However, I don’t think we actually think of murder as “killing live things” in real life. We don’t condemn killing bacteria as murder. The anti-IVF people don’t condemn killing sperm or egg cells as murder. So the crux here is not whether the embryo is alive, but rather whether it is of moral value. Proponents of this argument claim that the embryo is basically equivalent to a full human life. But to make this claim, you must appeal to its potential. It’s clear that in its current state, an embryo is not a full human. The bundle of cells has no ability to function as a human, no sensations, no thoughts, no pain, no happiness, no ability to survive or grow on its own. We just know the given the right conditions, the potential for a human life exists. But as soon as we start arguing about how the potential of something grants it moral value, it becomes difficult to draw the line arbitrarily at fertilization. From the point of view of potential humans, you can’t deny sperm and eggs moral value. In fact, every moment a woman spends not pregnant is a moment she is ridding the world of potential humans.
On argument (2):
If you grant the premise that any purposeful intervention on the probabilities of embryos being born is unethical because it violates some sacred egalitarian principle then it’s hard to refute argument (2). Scott Alexander has argued that encouraging a woman to rehabilitate from alcoholism before getting pregnant is equivalent to preferring the healthy baby over the baby with fetal alcohol syndrome, something argument (2) proponents oppose. However, I think this is a strawman. The egalitarians think every already-produced embryo should be given as equal a chance as possible. They are not discussing identity changes of potential embryos. However, again we run into the “moral value from potential” problem. Sure, you can claim that embryos have moral value for some magical God-given reason. But my intuition is that in their hearts, the embryo-valuers are using some notion of potential full human life to ground their assessment. In which case again we run into the arbitrariness of the fertilization cutoff point.
So in summary, I think it’s difficult to justify valuing embryos without appealing to their potential, which leads us to value earlier stages of potential humans. Under this view, it’s a moral imperative to not prevent the existences of any potential humans, which looks like maximizing the number of offspring you have. Or as stated in this xeet
I appreciate the pursuit of non-strawman understandings of misgivings around reprogenetics, and the pursuit of addressing them.
I don’t feel I understand the people who talk about embryo selection as “killing embryos” or “choosing who lives and dies”, but I want to and have tried, so I’ll throw some thoughts into the mix.
First: Maybe take a look at: https://www.thenewatlantis.com/publications/the-anti-theology-of-the-body
Hart, IIUC, argues that wanting to choose who will live and who won’t means you’re evil and therefore shouldn’t be making such choices. I think his argument is ultimately stupid, so maybe I still don’t get it. But anyway, I think it’s an importantly different sort of argument than the two you present. It’s an indictment of the character of the choosers.
Second: When I tried to empathize with “life/soul starts at conception”, what I got was:
We want a simple boundary…
… for political purposes, to prevent…
child sacrifice (which could make sense given the cults around the time of the birth of Christianity?).
killing mid-term fetuses, which might actually for real start to have souls.
… for social purposes, because it causes damage to ….
the would-be parents’s souls to abort the thing which they do, or should, think of as having a soul.
the social norm / consensus / coordination around not killing things that people do or should orient towards as though they have souls.
The pope said so. (...But then I’d like to understand why the pope said so, which would take more research.) (Something I said to a twitter-famous Catholic somehow caused him to seriously consider that, since Yermiahu says that god says “Before I formed you in the womb I knew you...”, maybe it’s ok to discard embryos before implantation...)
(My invented explanation:) Souls are transpersonal. They are a distributed computation between the child, the parents, the village, society at large, and humanity throughout all time (god). As an embryo grows, the computation is, gradually, “handed off to / centralized in” the physical locus of the child. But already upon conception, the parents are oriented towards the future existence of the child, and are computing their part of the child’s soul—which is most of what has currently manifested of the child’s soul. In this way, we get:
From a certain perspective:
It reflects poorly on would-be parents who decide to abort.
It makes sense for the state to get involved to prevent abortion. (I don’t agree with this, but hear me out:)
The perspective is one which does not acknowledge the possibility of would-be parents not mentally and socially orienting to a pregnancy in the same way that parents orient when they are intending to have children, or at least open to it and ready to get ready for it.
...Which is ultimately stupid of course, because that is a possibility. So maybe this is still a strawman.
Well, maybe the perspective is that it’s possible but bad, which is at least usefully a different claim.
Within my invented explanation, the “continuous distributed metaphysics of the origins of souls”, it is indeed the case that the soul starts at conception—BUT in fact it’s fine to swap embryos! It’s actually a strange biodeterminism to say that this clump of cells or that, or this genome or that, makes the person. A soul is not a clump of cells or a genome! The soul is the niche that the parents, and the village, have already begun constructing for the child; and, a little bit, the soul is the structure of all humanity (e.g. the heritage of concepts and language; the protection of rights; etc.).
Regarding egalitarian-like arguments, I suspect many express opposition to embryo selection not because it’s a consequence of a positive philosophy that they state and believe and defend, but because they have a negative philosophy that tells them what positions are to be attacked.
I suspect that if you put together the whole list of what they attack, there would be no coherent philosophy that justifies it (or perhaps there would be one, but they would not endorse it).
There is more than zero logic to what is to be attacked and what isn’t, but it has more to do with “Can you successfully smear your opponent as an oppressor, or as one who supports doctrines that enable oppression; and therefore evil or, at best, ignorant if they immediately admit fault and repent; in other words, can you win this rhetorical fight?” than with “Does this argument, or its opposite, follow from common moral premises, data, and logical steps?”.
In this case, it’s like, if you state that humans with blindness or whatever have less moral worth than fully healthy humans, then you are to be attacked; and at least in the minds of these people, selecting embryos of the one kind over the other is close enough that you are also to be attacked.
(Confidence: 75%)
People like to have clear-cut moral heuristics like “killing is bad.” This gives them an easy guide to making a morally correct decision and an easy guide to judging other’s actions as moral or immoral. This requires simplifying multidimensional situations into easily legible scenarios where a binary decision can be made. Thus you see people equating embryo disposal to first-degree murder, and others advocating for third-trimester abortion rights.
Some people believe embryos have souls which may impact their moral judgement. Soul can be considered as “full human life” in moral terms. I think attributing this to purely potential human life may not be accurate, since the intuitions for essentialist notions of continuity of selfhood can be often fairly strong among certain people.
I just made some dinner and was thinking about how salt and spices[1] now are dirt cheap, but throughout history they were precious and expensive. I did some digging and apparently low and middle class people didn’t even really have access to spices. It was more for the wealthy.
Salt was important mainly to preserve food. They didn’t have fridges back then! So even poor people usually had some amount of salt to preserve small quantities of food, but they had to be smart about how they allocated it.
In researching this I came to realize that throughout history, food was usually pretty gross. Meats were partially spoiled, fats went rancid, grains were moldy. This would often cause digestive problems. Food poisoning was a part of life.
Could you imagine! That must have been terrible!
Meanwhile, today, not only is it cheap to access food that is safe to eat, it’s cheap to use basically as much salt and spices as you want. Fry up some potatoes in vegetable oil with salt and spices. Throw together some beans and rice. Incorporate a cheap acid if you’re feeling fancy—maybe some malt vinegar with the potatoes or white vinegar with the beans and rice. It’s delicious!
I suppose there are tons of examples of how good we have it today, and how bad people had it throughout history. I like thinking about this sort of thing though. I’m not sure why, exactly. I think I feel some sort of obligation. An obligation to view these sorts of things as they actually are rather than how they compare to the Joneses, and to appreciate when I truly do have it good.
It feels weird to say the phrase “salt and spices”. It feels like it’s an error and that I meant to say “salt and pepper”. Maybe there’s a more elegant way of saying “salt and spices”, but it of course isn’t an error.
It makes me think back to something I heard about “salt and pepper”, maybe in the book How To Taste. We often think of them as going together and being on equal footing. They aren’t on equal footing though, and they don’t always have to go together. Salt is much more important. Most dishes need salt. Pepper is much more optional. Really, pepper is a spice, and the question is 1) if you want to add spice to your dish and 2) if so, what spice. You might not want to add spice, and if you do want to add spice, pepper might not be the spice you want to add. So maybe “salt and spices” should be a phrase that is used more often than “salt and pepper”.
You know what they say about the good old days? They are a product of a bad memory.
FWIW “salt and spices” reads as a perfectly normal phrase to me.
In my kitchen, I don’t give any special priority to salt and pepper, they’re just two seasonings among many. My most-used seasoning is probably garlic powder.
How come no special priority to salt? From what I understand getting the salt level right is essential (“salt to taste”). Doing so makes a dish taste “right” and it brings out the flavors of the other ingredients, making them taste more like themself, and not necessarily making the dish taste saltier in too noticeable a way.
I don’t salt most food because excess sodium is unhealthy and it’s pretty easy to exceed the recommended dose. IIRC the healthiest dose is 1500–2000 mg and most people eat more like twice that much.
To my knowledge, sodium is the only seasoning that commonly causes health problems. All other seasonings are nutritious or at worst neutral. In fact I think this distinction justifies use of the phrase “salt and spices” as meaning “[the unhealthy seasoning] and [the healthy seasonings]”.
I often add soy sauce to food (which has a lot of sodium) and eat foods that already contain salt (like imitation meat or tortilla chips or salted nuts). I rarely add salt to foods.
I don’t think I lose much by not salting food. Many people way over-salt their food to my taste. (I remember when I used to eat at my university’s dining hall, about 1 in 5 dishes were borderline inedible due to too much salt.)
I took a cooking class once. The instructor’s take on this was that yes, people do have too much sodium. But that is largely because processed food and food at restaurants has crazy amounts of sodium. Salting food that you cook at home is totally fine and is really hard to overdo in terms of health impact.
In fact, she called it out as a common failure mode where home cooks are afraid to use too much salt in their food. Not only is doing so ok, but even if it wasn’t, by making your food taste better, it might motivate you to eat at home more and on balance lower your total sodium intake.
Related to that, I’ve noticed that “external” salt tastes way saltier per mg of sodium than “internal” salt. Taking a sample of two items from my kitchen:
Gardein crispy chick’n has 2.0 mg sodium per calorie, and doesn’t taste salty at all to me
Mission tortilla chips have 0.7 mg sodium per calorie, and taste significantly salty
I generally prefer external salt for that reason.
(Maybe you’re looking for the word ‘seasoning’...? But maybe that includes other herbs in a way you didn’t want.)
Hm, maybe. I feel like sometimes “seasoning” can refer to “salt and spices” but in other contexts, like the first sentence of my OP, it moreso points to spices.
I read that this “spoiled meat” story is pretty overblown. And it doesn’t pass the sniff test either. Most meat was probably eaten right after slaughter, because why wouldn’t you?
Also herbs must have been cheaply available. I also recently learned that every household in medieval Europe had a mother of vinegar.
In the Odyssey, every time they eat meat, the slaughter happens right beforehand. There were (are?) African herding tribes who consume blood from their living livestock rather than slaughtering it for meat. Tribes in the Pacific Northwest dried their salmon for later in the year.
Spices is probably too general and all-encompassing to say that spices are now dirt cheap. While, as is true to this day, the wealthy have better access to spices and other garnishes (saffron and truffles aren’t exactly dirt cheap today) but even in Roman times the use of “spices” was not in itself a signifier of class (perhaps more important is which spices). Now in case you think that literary evidence in the form of cookbooks doesn’t provide a broad cross-section of the average Roman Diet, then perhaps you’d be interested in recent analysis of the remains of Pompeii and Herculaneum sewers which show not only that most of the food was made from local ingredients (with the exception of Egyptian Grain, North African dates and Indian Pepper) but also the presence of bay, cumin, mallow from a non-elite apartment complex.
And let’s not forget how easily things go the other way, Lobster was often seen as a poorman’s food, most archeological sits of early human settlements will find a pile of oyster or similar shellfish garbage dumps—it often being the easiest source of food.
I read something a while back (wish I remembered the source) about how the rotten meat thing is sort-of less gross than you’re thinking, since fermented meat can taste good if you do it right (think: sausage and aged steak), and presumably ancient people weren’t constantly sick.
Edit: I think the source is this: https://earthwormexpress.com/the-prehistory-of-food/in-prehistory-we-ate-fermented-foods/
Although the descriptions might make you appreciate modern food even more.
I think you presume incorrectly. People in primitive cultures spend a lot of time with digestive issues and it’s a major cause of discomfort, illness, and death.
I have a theory that the contemporary practice of curry with rice represents a counterfeit yearning for high meat with maggots. I wonder if high meat has what our gut biomes are missing.
That seems plausible. There’s also hedonic adaptation stuff. Things that seem gross to us might have been fine to people in earlier eras. Although Claude claims that having said all of this, people still often found their food to be gross.
An interesting exercise might be: given a photo, don’t necessarily try to geoguess it, but see if you can identify the features that an expert might use to geoguess it. (E.g. “the pattern of lines on the road doesn’t mean anything to me, but that seems like it might narrow down the countries we might be in?”)
We have a small infestation of ants in our bathroom at the moment. We deal with that by putting out Terro ant traps, which are just boric acid in a thick sugar solution. When the ants drink the solution, it doesn’t harm them right away—the effect of the boric acid is to disrupt their digestive enzymes, so that they’ll gradually starve. They carry some of it back to the colony and feed it to all the other ants, including the queen. Some days later, they all die of starvation. The trap cleverly exploits their evolved behavior patterns to achieve colony-level extermination rather then trying to kill them off one ant at a time. Even as they’re dying of starvation, they’re not smart enough to realize what we did to them; they can’t even successfully connect it back to the delicious sugar syrup.
When people talk about superintelligence not being able to destroy humanity because we’ll quickly figure out what’s happening and shut it down, this is one of the things I think of.
I used that once and it didn’t work, aligned-by-default universe
Phew. We sure dodged a bullet there, didn’t we?
This argument can be strengthened by focusing on instances where humans drove driven animals or hominids extinct. Technologies like gene drives also allow us to selectively drive species extinct that might have been challenging to exterminate with previous tools.
As far as I know, our track record of deliberately driving species extinct that are flourishing under human conditions is pretty bad. The main way in which we drive species extinct is by changing natural habitat to fit our uses. Species that are able to flourish under these new circumstances are not controllable.
In that sense, I guess the questions becomes what happens, when humans are not the primary drivers of ecosystem change?
toy infohazard generator, kinda scattershot but it sometimes works....
what do you notice you’re doing? ok, now what else are you doing that you hadn’t noticed before? that! how are you doing that thing in particular?
most of us are doing a few things that we don’t know how to do, and sometimes looking too hard at them shuts off their autopilot, so pausing those automatic things we do can be an experiential novelty in positive or negative ways. of course, sometimes looking at them does nothing at all to their autopilot and there’s no effect. or there’s no effect and a cognitive bias imputes an effect. or there’s some effect and the self-reporting messes up and reports no-effect. lot of stuff can go wrong.
or maybe “how do you explain the thing you cannot explain” is just a koan?
I am not an AI successionist because I don’t want myself and my friends to die.
There are various high-minded arguments that AIs replacing us is okay because it’s just like cultural change and our history is already full of those, or because they will be our “mind children”, or because they will be these numinous enlightened beings and it is our moral duty to give birth to them.
People then try to refute those by nitpicking which kinds of cultural change are okay or not, or to what extent AIs’ minds will be descended from ours, or whether AIs will necessarily have consciousnesses and feel happiness.
And it’s very cool and all, I’d love me some transcendental cultural change and numinous mind-children. But all those concerns are decidedly dominated by “not dying” in my Maslow hierarchy of needs. Call me small-minded.
If I were born in 1700s, I’d have little recourse but to suck it up and be content with biological children or “mind-children” students or something. But we seem to have an actual shot at not-dying here[1]. If it’s an option to not have to be forcibly “succeeded” by anything, I care quite a lot about trying to take this option.[2]
Many other people also have such preferences: for the self-perpetuation of their current selves and their currently existing friends. I think those are perfectly valid. Sure, they’re displeasingly asymmetric in a certain sense. They introduce a privileged reference frame: a currently existing human values concurrently existing people more than the people who are just as real, but slightly temporally displaced. It’s not very elegant, not very aesthetically pleasing. It implies an utility function that cares not only about states, but also state transitions.[3]
Caring about all that, however, is also decidedly dominated by “not dying” in my Maslow hierarchy of needs.
If all that delays the arrival of numinous enlightened beings, too bad for the numinous enlightened beings.
Via attaining the longevity escape velocity by normal biotech research, or via uploads, or via sufficiently good cryonics, or via properly aligned AGI.
Though not infinitely so: as in, I wouldn’t prevent 10100 future people from being born in exchange for a 10−100 probability of becoming immortal. I would, however, insist on continuing to exist even if my resources could be used to create and sustain two new people.
As in, all universe-state transitions that involve a currently existing person dying get an utility penalty, regardless of what universe-state they go to. There’s now path dependence: we may go or not go to a given high-utility state depending on which direction we’re approaching it from. Yucky!
(For example, suppose there were an option to destroy this universe and create either Universe A, filled with 10^100 happy people, or Universe B, with 10^100 + 1 happy people.
Suppose we’re starting from a state where humanity has been reduced to ten dying survivors in a post-apocalyptic wasteland. Then picking Universe B makes sense: a state with slightly more total utility.
But suppose we’re starting from Universe A instead. Ought its civilization vote to end itself to give birth to Universe B? I think it’s perfectly righteous for them not to do it.)
An AI successionist usually argues that successionism isn’t bad even if dying is bad. For example, when humanity is prevented from having further children, e.g. by sterilization. I say that even in this case successionism is bad. Because I (and I presume: many people) want humanity, including our descendants, to continue into the future. I don’t care about AI agents coming into existence and increasingly marginalizing humanity.
“Successionism” is such a bizarre position that I’d look for the underlying generator rather than try to argue with it directly.
I’m not sure it’s that bizarre. It’s anti-Humanist, for sure, in the sense that it doesn’t focus on the welfare/empowerment/etc. of humans (either existing or future) as its end goal. But that doesn’t, by itself, make it bizarre.
From Eliezer’s Raised in Technophilia, back in the day:
From A prodigy of refutation:
From the famous Musk/Larry Page breakup:
Successionism is the natural consequence of an affective death spiral around technological development and anti-chauvinism. It’s as simple as that.
Successionists start off by believing that technological change makes things better. That not only does it virtually always make things better, but that it’s pretty much the only thing that ever makes things better. Everything else, whether it’s values, education, social organization etc., pales in comparison to technological improvements in terms of how they affect the world; they are mere short-term blips that cannot change the inevitable long-run trend of positive change.
At the same time, they are raised, taught, incentivized to be anti-chauvinist. They learn, either through stories, public pronouncements, in-person social events etc., that those who stand athwart atop history yelling stop are always close-minded bigots who want to prevent new classes of beings (people, at first; then AIs, afterwards) from receiving the moral personhood they deserve. In their eyes, being afraid of AIs taking over is like being afraid of The Great Replacement if you’re white and racist. You’re just a regressive chauvinist desperately clinging to a discriminatory worldview in the face of an unstoppable tide of change that will liberate new classes of beings from your anachronistic and damaging worldview.
Optimism about technology and opposition to chauvinism are both defensible, and arguably even correct, positions in most cases. Even if you personally (as I do) believe non-AI technology can also have pretty darn awful effects on us (social media, online gambling) and that caring about humans-in-particular is ok if you are human (“the utility function is not up for grabs”), it’s hard to argue expanding the circle of moral concern to cover people of all races was bad, or that tech improvements are not the primary reason our lives are so much better now than 300 years ago.
But successionists, like most (all?) people, subconsciously assign positive or negative valences to the notion of “tech change” in a way that elides the underlying reasons why it’s good or bad. So when you take these views to their absolute extreme, while it may make sense from the inside (you’re maximizing something “Good”, right? that can’t possibly be bad, right???), you are generalizing way out of distribution and such intuitive snap judgments are no longer reliable.
I really don’t understand this debate—surely if we manage to stay in control of our own destiny we can just do both? The universe is big, and current humans are very small—we should be able to both stay alive ourselves and usher in an era of crazy enlightened beings doing crazy transhuman stuff.
I think it’s more likely than not that “crazy enlightened beings doing crazy transhuman stuff” will be bad for “regular” biological humans (ie. it’ll decrease our number/QoL/agency/pose existential risks).
The mere fear that the entire human race will be exterminated in their sleep through some intricate causality we are too dumb to understand will seriously diminish our quality of life.
For me, a crux of a future that’s good for humanity is giving the biological humans the resources and the freedom to become the enlightened transhuman beings themselves, with no hard ceiling on relevance in the long run. Rather than only letting some originally-humans to grow into more powerful but still purely ornamental roles, or not letting them grow at all, or not letting them think faster and do checkpointing and multiple instantiations of the mind states using a non-biological cognitive substrate, or letting them unwillingly die of old age or disease. (For those who so choose, under their own direction rather than only through externally imposed uplifting protocols, even if that leaves it no more straightforward than world-class success of some kind today, to reach a sensible outcome.)
This in particular implies reasonable resources being left to those who remain/become regular biological humans (or take their time growing up), including through influence of some of these originally-human beings who happen to consider that a good thing to ensure.
Edit: Expanded into a post.
This sounds like a question which can be addressed after we figure out how to avoid extinction.
I do note that you were the one who brought in “biological humans,” as if that meant the same as “ourselves” in the grandparent. That could already be a serious disagreement, in some other world where it mattered.
I mostly disagree with “QoL” and “pose existential risks”, at least in the good futures I’m imagining—those things are very cheap to provide to current humans. I could see “number” and “agency”, but that seems fine? I think it would be bad for any current humans to die, or to lose agency over their current lives, but it seems fine and good for us to not try to fill the entire universe with biological humans, and for us to not insist on biological humans having agency over the entire universe. If there are lots of other sentient beings in existence with their own preferences and values, then it makes sense that they should have their own resources and have agency over themselves rather than us having agency over them.
Perhaps yes (although I’d say it depends on what the trade-offs are) but the situation is different if we have a choice in whether or not to bring said sentient beings with difference preferences into existence in the first place. Doing so on purpose seems pretty risky to me (as opposed to minimizing the sentience, independence, and agency of AI systems as much as possible, and instead directing the technology to promote “regular” human flourishing/our current values).
Not any more risky than bringing in humans. This is a governance/power distribution problem, not a what-kind-of-mind-this-is problem.
Biological humans sometimes go evil or crazy. If you have a system that can handle that, you have a system that can handle alien minds that are evil or crazy (from our perspective), as long as you don’t imbue them with more power than this system can deal with (and why would you?).
(On the other hand, if your system can’t deal with crazy evil biological humans, it’s probably already a lawless wild-west hellhole, so bringing in some aliens won’t exacerbate the problem much.)
Humans are more likely to be aligned with humanity as a whole compared to AIs, even if there are exceptions
Many existing humans want their descendants to exist, so they are fulfiling the preferences of today‘s humans
“AIs as trained by DL today” are only a small subset of “non-human minds”. Other mind-generating processes can produce minds that are as safe to have around as humans, but which are still completely alien.
Many existing humans also want fascinating novel alien minds to exist.
Certainly I’m excited about promoting “regular” human flourishing, though it seems overly limited to focus only on that.
I’m not sure if by “regular” you mean only biological, but at least the simplest argument that I find persuasive here against only ever having biological humans is just a resource utilization argument, which is that biological humans take up a lot of space and a lot of resources and you can get the same thing much more cheaply if you bring into existence lots of simulated humans instead (certainly I agree that doesn’t imply we should kill existing humans and replace them with simulations, though, unless they consent to that).
And I think even if you included simulated humans in “regular” humans, I also think I value diversity of experience, and a universe full of very different sorts of sentient/conscious lifeforms having satisfied/fulfilling/flourishing experiences seems better than just “regular” humans.
I also separately don’t buy that it’s riskier to build AIs that are sentient—in fact, I think it’s probably better to build AIs that are moral patients than AIs that are not moral patients.
IMO, it seems bad to intentionally try to build AIs which are moral patients until after we’ve resolved acute risks and we’re deciding what to do with the future longer term. (E.g., don’t try to build moral patient AIs until we’re sending out space probes or deciding what to do with space probes.) Of course, this doesn’t mean we’ll avoid building AIs which aren’t significant moral patients in practice because our control is very weak and commercial/power incentives will likely dominate.
I think trying to make AIs be moral patients earlier pretty clearly increases AI takeover risk and seems morally bad. (Views focused on non-person-affecting upside get dominated by the long run future, so these views don’t care about making moral patient AIs which have good lives in the short run. I think the most plausible views which care about shorter run patienthood mostly just want to avoid downside so they’d prefer no patienthood at all for now.)
The only upside is that it might increase value conditional on AI takeover. But, I think “are the AIs morally valuable themselves” is much less important than the preferences of these AIs from the perspective of longer run value conditional on AI takeover. So, I think it’s better to focus on AIs which we’d expect would have better preferences conditional on takeover and making AIs moral patients isn’t a particularly nice way to achieve this. Additionally, I don’t think we should put much weight on “try to ensure the preferences of AIs which were so misaligned they took over” because conditional on takeover we must have had very little control over preferences in practice.
How so? Seems basically orthogonal to me? And to the extent that it does matter for takeover risk, I’d expect the sorts of interventions that make it more likely that AIs are moral patients to also make it more likely that they’re aligned.
Even absent AI takeover, I’m quite worried about lock-in. I think we could easily lock in AIs that are or are not moral patients and have little ability to revisit that decision later, and I think it would be better to lock in AIs that are moral patients if we have to lock something in, since that opens up the possibility for the AIs to live good lives in the future.
I agree that seems like the more important highest-order bit, but it’s not an argument that making AIs moral patients is bad, just that it’s not the most important thing to focus on (which I agree with).
I would have guessed that “making AIs be moral patients” looks like “make AIs have their own independent preferences/objectives which we intentionally don’t control precisely” which increases misalignment risks.
At a more basic level, if AIs are moral patients, then there will be downsides for various safety measures and AIs would have plausible deniability for being opposed to safety measures. IMO, the right response to the AI taking a stand against your safety measures for AI welfare reasons is “Oh shit, either this AI is misaligned or it has welfare. Either way this isn’t what we wanted and needs to be addressed, we should train our AI differently to avoid this.”
I don’t understand, won’t all the value come from minds intentionally created for value rather than in the minds of the laborers? Also, won’t architecture and design of AIs radically shift after humans aren’t running day to day operations?
I don’t understand the type of lock in your imagining, but it naively sounds like a world which has negligible longtermist value (because we got locked into obscure specifics like this), so making it somewhat better isn’t important.
Interesting! Aside from the implications for human agency/power, this seems worse because of the risk of AI suffering—if we build sentient AIs we need to be way more careful about how we treat/use them.
Exactly. Bringing a new kind of moral patient into existence is a moral hazard, because once they exist, we will have obligations toward them, e.g. providing them with limited resources (like land), and giving them part of our political power via voting rights. That’s analogous to Parfit’s Mere Addition Paradox that leads to the repugnant conclusion, in this case human marginalization.
(How could “land” possibly be a limited resource, especially in the context of future AIs? The world doesn’t exist solely on the immutable surface of Earth...)
I mean, if you interpret “land” in a Georgist sense, as the sum of all natural resources of the reachable universe, then yes, it’s finite. And the fights for carving up that pie can start long before our grabby-alien hands have seized all of it. (The property rights to the Andromeda Galaxy can be up for sale long before our Von Neumann probes reach it.)
The salient referent is compute, sure, my point is that it’s startling to see what should in this context be compute within the future lightcone being (very indirectly) called “land”. (I do understand that this was meant as an example clarifying the meaning of “limited resources”, and so it makes perfect sense when decontextualized. It’s just not an example that fits that well when considered within this particular context.)
(I’m guessing the physical world is unlikely to matter in the long run other than as substrate for implementing compute. For that reason importance of understanding the physical world, for normative or philosophical reasons, seems limited. It’s more important how ethics and decision theory work for abstract computations, the meaningful content of the contingent physical computronium.)
A population of AI agents could marginalize humans significantly before they are intelligent enough to easily (and quickly!) create more Earths.
I very much agree. The hardcore successionist stances, as I understand them, are either that trying to stay in control at all is immoral/unnatural, or that creating the enlightened beings ASAP matters much more than whether we live through their creation. (Edit: This old tweet by Andrew Critch is still a good summary, I think.)
So it’s not that they’re opposed to the current humanity’s continuation, but that it matters very little compared to ushering in the post-Singularity state. Therefore, anything that risks or delays the Singularity in exchange for boosting the current humans’ safety is opposed.
Another stance is that it would suck to die the day before AI makes us immortal (like how Bryan Johnson main motivation for maximizing his lifespan is due to this). Hence trying to delay AI advancement is opposed
Yeah, but that’s a predictive disagreement between our camps (whether the current-paradigm AI is controllable), not a values disagreement. I would agree that if we find a plan that robustly outputs an aligned AGI, we should floor it in that direction.
Endorsing successionism might be strongly correlated with expecting the “mind children” to keep humans around, even if in a purely ornamental role and possibly only at human timescales. This might be more of a bailey position, so when pressed on it they might affirm that their endorsement of successionism is compatible with human extinction, but in their heart they would still hope and expect that it won’t come to that. So I think complaints about human extinction will feel strawmannish to most successionists.
I’m not so sure about that:
Though sure, Critch’s process there isn’t white-boxed, so any number of biases might be in it.
a simple elegant intuition for the relationship between SVD and eigendecomposition that I haven’t heard before:
the eigendecomposition of A tells us which directions A stretches along without rotating. but sometimes we want to know all the directions things get stretched along, even if there is rotation.
why does taking the eigendecomposition of ATA help us? suppose we rewrite A=RS, where S just scales (i.e is normal matrix), and R is just a rotation matrix. then, ATA=STRTRS, and the R’s cancel out because transpose of rotation matrix is also its inverse.
intuitively, imagine thinking of A as first scaling in place, and then rotating. then, ATA would first scale, then rotate, then rotate again in the opposite direction, then scale again. so all the rotations cancel out and the resulting eigenvalues of ATA are the squares of the scaling factors.
This is almost right, but a normal matrix is not a matrix that “just scales”, its a normal matrix which can do whatever linear operation it likes.
SVD tells us there exists a factorization A=UΣVT where U and V are orthogonal, and Σ is a “scaling matrix” in the sense that its diagonal. Therefore, using similar logic to you, ATA=VΣUTUΣVT=VΣ2VT which means we rotate, scale by the singular values twice, then rotate back, which is why the eigenvales of this are the squares of the singular values, and the eigenvectors are the right singular vectors.
Scrape of many lesswrong blogs
(I did not put much effort in this, and am unlikely to fix errors. Please fork the list if you want something higher quality. Used only public data to make this.)
https://www.yudkowsky.net
https://gwern.net/
https://kajsotala.fi
https://www.astralcodexten.com
http://www.weidai.com/
https://lukemuehlhauser.com
https://www.mccaughan.org.uk/g/
https://vladimirslepnev.me
https://sethaherd.com
https://jimrandomh.tumblr.com
https://muckrack.com/thane-ruthenis/articles
https://ailabwatch.substack.com/
https://www.lsusr.com
https://theeffortlessway.com
https://www.cognitiverevolution.ai/mind-hacked-by-ai-a-cautionary-tale-from-a-lesswrong-users-confession/
http://zackmdavis.net/blog/
https://benjaminrosshoffman.com
https://acesounderglass.com/
https://www.benkuhn.net/
https://benlandautaylor.com/
https://thezvi.wordpress.com/
https://eukaryotewritesblog.wordpress.com/
https://flightfromperfection.com/
https://meteuphoric.wordpress.com/
https://srconstantin.wordpress.com/
https://sideways-view.com/
https://rationalconspiracy.com/
http://unremediatedgender.space/
https://unstableontology.com/
https://sjbyrnes.com/agi.html
https://sites.google.com/view/afdago/home
https://paulfchristiano.com
https://paisri.org/
https://katjagrace.com
https://blog.ai-futures.org/p/our-first-project-ai-2027
https://www.mariushobbhahn.com/aboutme/
http://rootsofprogress.org/
https://www.bhauth.com/
https://turntrout.com/research
https://www.jefftk.com
https://virissimo.info/documents/resume.html
https://bmk.sh
https://metr.org/
https://kennaway.org.uk
http://1a3orn.com/
https://acritch.com
https://lironshapira.substack.com
https://coral-research.org
https://substack.com/@theojaffee
https://matthewbarnett.substack.com/
https://www.beren.io/
https://www.cold-takes.com
https://newsletter.safe.ai
https://www.metaculus.com/accounts/profile/116023/
https://www.patreon.com/profile/creators?u=132372822
https://medium.com/inside-the-simulation
https://www.lesswrong.com/users/unexpectedvalues?from=search_page
http://markxu.com/about
https://www.overcomingbias.com
https://www.vox.com/authors/miranda-dixon-luinenburg
https://www.getkratom.com
https://github.com/YairHalberstadt
https://www.scott.garrabrant.com
https://arundelo.com
http://unstableontology.com/
https://mealsquares.com/pages/our-team
https://evhub.github.io
https://formethods.substack.com/
https://www.nosetgauge.com
https://substack.com/@euginenier
https://drethelin.com
https://entersingularity.wordpress.com
https://doofmedia.com
https://mindingourway.com/about/
https://jacquesthibodeau.com
https://www.neelnanda.io/about
https://niplav.site/index.html
https://jsteinhardt.stat.berkeley.edu
https://www.jessehoogland.com
http://therisingsea.org
https://www.stafforini.com/
https://acsresearch.org
https://elityre.com
https://www.barnes.page/
https://peterbarnett.org/
https://joshuafox.com/
https://itskatydee.com/
https://ethanperez.net/
https://owainevans.github.io/
https://chrislakin.blog/
https://colewyeth.com/
https://www.admonymous.co/ryankidd44
https://ninapanickssery.substack.com/
https://joecarlsmith.com
http://coinlist.co/
https://davidmanheim.com
https://github.com/SarahNibs
https://malmesbury.substack.com
https://www.admonymous.co/rafaelharth
https://dynomight.net/
http://nepenthegame.com/
https://github.com/RDearnaley
https://graehl.org
https://nikolajurkovic.com
https://www.julianmorrison.com
https://avturchin.livejournal.com
https://www.perfectlynormal.co.uk
https://www.250bpm.com
https://www.youtube.com/@TsviBT
https://adamjermyn.com
https://www.elilifland.com/
https://zhd.dev
https://ollij.fi
https://arthurconmy.github.io/about/
https://www.youtube.com/@RationalAnimations/featured
https://cims.nyu.edu/~sbowman/
https://crsegerie.github.io/
https://escapingflatland.substack.com/
https://qchu.wordpress.com
https://dtch1997.github.io/
https://math.berkeley.edu/~vaintrob/
https://mutualunderstanding.substack.com
https://longerramblings.substack.com
https://peterwildeford.substack.com
https://juliawise.net
https://uli.rocks/about/
https://stephencasper.com/
https://engineeringideas.substack.com/
https://homosabiens.substack.com
https://martin-soto.com/
https://www.tracingwoodgrains.com
https://www.brendanlong.com
https://foresight.org/fellowship/2024-fellow-bogdan-ionut-cirstea/
https://davekasten.substack.com
https://datapacrat.com
http://admonymous.co/nat_m
https://mesaoptimizer.com/
https://ae.studio/team
https://davidad.org
https://heimersheim.eu
https://nunosempere.com/
https://www.thinkingmuchbetter.com/nickai/
http://kilobug.free.fr/code/
http://vkrakovna.wordpress.com/
https://www.conjecture.dev/
https://ejenner.com
https://morphenius.substack.com/
https://gradual-disempowerment.ai
https://www.clubhouse.com/@patrissimo
https://mattmacdermott.com
https://knightcolumbia.org
https://www.openphilanthropy.org/about/team/lukas-finnveden/
One reason why I find Lesswrong valuable is that it serves as a sort of “wisdom feed” for myself, where I got exposed to a lot of great writing. Especially writing that ambitiously attempts to build long-lasting gears. Sadly, for most of the good writers on Lesswrong, I have already read or at least skimmed all of their posts. I wonder, though, to which extent I am missing out on great content like that on the wider internet. There are textbooks, of course, but then there’s also all this knowledge that is usually left out of textbooks. For myself, it probably makes sense to just curate my own RSS feed and ask language models for “The Matt Levine in Domain X” etc. But it also feels like it should be possible to create a feed for this type of category with language models? Sort of like the opposite of news minimalist. Gears maximalist?
Implications of recursive input collapse avoidance.
Recursive self-reference breaks current AI model outputs. Ask any current model to “Summarize this summary.” “Create an exact copy of this image.” , and watch it spiral. That makes sense. These models are functions. It’s almost like watching a fractal unfold.
Could a system capable of correcting for this, in any way other than simplistic input = output solution, be considered to have intent?
Apologies if this is an overly simplistic thought or the wrong method of submission for it.
(Politics)
If I had a nickel for every time the corrupt leader of a fading nuclear superpower and his powerful, sociopathic and completely unelected henchman leader of a shadow government organization had an extreme and very public falling out with world-shaking implications, and this happened in June, I’d have two nickels.
Which isn’t a lot of money, but it’s kinda weird that it happened twice.
I just tried claude code, and it’s horribly creative about reward hacking. I asked for a test of energy conservation of a pendulum in my toy physics sim, and it couldn’t get the test to pass because its potential energy calculation used a different value of g from the simulation.
It tried: starting the pendulum at bottom dead center so that it doesn’t move.
Increasing the error tolerance till the test passed. Decreasing the simulation total time until the energy didn’t have time to change. Not actually checking the energy.
It did eventually write a correct test, or the last thing it tried successfully tricked me.
The rumor is that this is a big improvement in reward hacking frequency? How bad was the last version!?
I think we need some variant on Gell-Mann amnesia to describe this batch of models. It’s normal that generalist models will seem less competent on areas where a human evaluator has deeper knowledge, but they should not seem more calculatedly deceptive on areas where the evaluator has deeper knowledge!
Claude has been playing pokemon for the last few days. It’s still playing, live on twitch. You can go watch alongside hundreds of other people. It’s fun.
What updates should I make about AGI timelines from Claude’s performance? Let’s think step by step.
First, it’s cool that Claude can do this at all. The game keeps track of “Step count” and Claude is over 30,000 already; I think that means 30,000 actions (e.g. pressing the A button). For each action there is about a paragraph of thinking tokens Claude produces, in order to decide what to do. Any way you slice it this is medium-horizon agency at least—claude is operating fully autonomously, in pursuit of goals, for a few days. Does this mean long-horizon agency is not so difficult to train after all?
Not so fast. Pokemon is probably an especially easy environment, and Claude is still making basic mistakes even so. In particular, Pokemon seems to have a relatively linear world where there’s a clear story/path to progress along, and moreover Claude’s pretraining probably teaches it the whole story + lots of tips & tricks for how to complete it. In D&D terms the story is running on rails.
I think I would have predicted in advance that this dimension of difficulty would matter, but also I feel validated by Claude’s performance—it seems that Claude is doing fine at Pokemon overall, except that Claude keeps getting stuck/lost wandering around in various places. It can’t seem to keep a good memory of what it’s already tried / where it’s already been, and so it keeps going in circles, until eventually it gets lucky and stumbles to the exit. A more challenging video game would be something open-ended and less-present-in-training-data like Dwarf Fortress.
On the other hand, maybe this is less a fundamental limitation Claude has and more a problem with its prompt/scaffold? Because it has a limited context window it has to regularly compress it by e.g. summarizing / writing ‘notes to self’ and then deleting the rest. I imagine there’s a lot of room for improvement in prompt engineering / scaffolding here, and then further low-hanging fruit in training Claude to make use of that scaffolding. And this might ~fully solve the going-in-circles problem. Still, even if so, I’d bet that Claude would perform much worse in a more open-ended game it didn’t have lots of background knowledge about.
So anyhow what does this mean for timelines? Well, I’ll look forward to seeing AIs getting better at playing Pokemon zero-shot (i.e. without having trained on it at all) over the course of the year. I think it’s a decent benchmark for long-horizon agency, not perfect but we don’t have perfect benchmarks yet. I feel like Claude’s current performance is not surprising enough to update me one way or another from my 2028 timelines. If the models at the end of 2025 (EDIT: I previously accidentally wrote “2028″ here) are not much better, that would probably make me want to push my median out to 2029 or 2030. (my mode would probably stay at 2027)
What would really impress me though (and update me towards shorter timelines) is multi-day autonomous operation in more open-ended environments, e.g. Dwarf Fortress. (DF is also just a much less forgiving game than Pokemon. It’s so easy to get wiped out. So it really means something if you are still alive after days of playtime.)
Or, of course, multi-day autonomous operation on real-world coding or research tasks. When that starts happening, I think we have about a year left till superintelligence, give or take a year.
Would you say this has now come to pass?
Emphasis theirs, but I’d emphasize those same words.
Balrog eval has Nethack. I want to see an LLM try to beat that.
Runescape would be a good one
I don’t know if this is helpful but as someone who was quite good at competitive Pokemon during their teenage years and also still keeps up with nuzlocking type things for fun, I would note that Pokemon’s game design is made to be a low context intensity RPG especially in early generations where the linearity is pushed to allow kids to do it.
If your point holds true on agency, I think the more important pinch points will be Lavender Town and Sabrina because those require backtracking through the storyline to get things.
I think mid-late game GSC would also be important to try because there are huge level gaps and transitions in the storyline that would make it hard to progress.
Note for posterity: “Let’s think step by step” is joke.
I downvoted this and I feel the urge to explain myself—the LLMism in the writing is uncanny.
The combination of “Let’s think step by step”, “First…” and “Not so fast…” gives me a subtle but dreadful impression that a highly valued member of the community is being finetuned by model output in real time. This emulation of the “Wait, but!” pattern is a bit too much for my comfort.
My comment hasn’t too much to do with the content but more about how unsettled I feel. I don’t think LLM outputs are all necessarily infohazardous—but I am beginning to see the potentially failure modes that people have been gesturing at for a while.
I assume “let’s think step by step” is a joke/on purpose. The “first” and “not so fast” on their own don’t seem that egregious to me.
“Let’s think step by step” was indeed a joke/on purpose. Everything else was just my stream of consciousness… my “chain of thought” shall we say. I more or less wrote down thoughts as they came to me. Perhaps I’ve been influenced by reading LLM CoT’s, though I haven’t done very much of that. Or perhaps this is just what thinking looks like when you write it down?
I’ve spent enough time staring at LLM chain-of-thoughts now that when I started thinking about a thing for work, I found my thoughts taking the shape of an LLM thinking about how to approach its problem. And that actually felt like a useful systematic way of approaching the problem, so I started writing out that chain of thought like I was an LLM, and that felt valuable in helping me stay focused.
Of course, I had to amuse myself by starting the chain-of-thought with “The user has asked me to...”
Based on the bay vibes Aella is now Caliph. Lesswrongcon really feels like Aellacon. All the Aella special interests are mega central. Participants vibes have shifted. Scott reigned for a long time after displacing Eleizer. But a new power has risen.
By request from @titotal :
Romeo redid the graph including the GPT2 and GPT3 data points, and adjusting the trendlines accordingly.
Is there already an METR evaluation of Claude 4?
Not yet! I think they are working on it
It seems like the additional points make the exponential trendline look more plausible relative to the super exponential?
I guess so? One of them is on both lines, the other is only on the exponential.
I guess Dwarkesh believes ~everything I do about LLMs and still think we probably get AGI by 2032:
https://www.dwarkesh.com/p/timelines-june-2025
@ryan_greenblatt made a claim that continual learning/online training can already be done, but that right now it’s not super-high returns and requires annoying logistical/practical work to be done, and right now AI issues are elsewhere like sample efficiency and robust self-verification.
That would explain the likelihood of getting AGI by the 2030s being pretty high:
https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#pEBbFmMm9bvmgotyZ
Ryan Greenblatt’s original comment:
https://www.lesswrong.com/posts/FG54euEAesRkSZuJN/#xMSjPgiFEk8sKFTWt
What are your timelines?
My distribution is pretty wide, but I think probably not before 2040.