I don’t think most people know how to make that mental move even with “Sam,” let alone with an LLM. I think even if they do know how to do something like it (most people don’t know how LLM RLHF works, but they might think something like “is it trying to convince me of something?”) that’s the mechanism that gets degraded over time, particularly if they do some sort of pushback and the LLM adapts smoothly enough to reassure them.
DaystarEld
Sure, I don’t get annoyed when people doubt LLM sentience. It’s labeling it as delusional that I specifically take issue with!
Yeah, that’s fair. I’ll edit!
I explained in my post that I used LLMs to outline this post and do final editing passes. I didn’t use it to write the content, including that bit.
Yep, that can definitely be helpful, but my prediction is that even if this became the default option in Claude or OpenAI, I think we’d likely see some new and unique failure modes, or just new permutations of the same sorts of failures, where “failure” here means something like “human ends up replacing their default sensemaking apparatus with whatever the LLMs tell them.” Maybe less, though!
but if they were optimizing for what you are most susceptible to, the resulting beliefs would look much more bespoke.
I’m not so sure! I think a lot of the bespoke elements are there, parts of the surface level of their interactions, but they’re mostly aesthetic, and the common, deeper themes you’re gesturing at are a result of the underlying models being fundamentally all extensions of the same “minds,” combined with the selection effect of what sorts of people post online about their experiences with AI.
This is indicative of autonomous agency. Establishing that conclusively is a tall order (one which I’d like to attempt, so I’m always open to hearing what sort of thing would convince you), but it’s important to notice the hints we keep getting.
I’ll have to think about this, because at first read over and consideration your post is really interesting to me (somehow I never saw, so thanks for writing and linking it! I feel a need to edit my post now to include/address parts of it, but will probably wait a day or two) but not convincing on this point.
I think there are attractor states, like the ones you document in your post , but that we would be making a mistake to treat those attractor states in the human/LLM interactions as proof of something besides “different LLM models are actually pretty similar to each other, psychologically, and to some degree different humans who engage in a lot of LLM use are too.”
I agree that those hints are important to keep paying attention to! But to me steganography isn’t a sign of autonomous agency in and of itself; not until we know for sure, somehow, that they’re passing coherent messages from on LLM to another, rather than those messages being the just the most eye-catching samples from the extreme tails.
That or some clear goal-directed use of hidden communication channels; passing messages of the kinds you decoded feel closer to LARPing aliveness than what I’d expect actual-agents to be trying to communicate to each other.But of course we’re talking about alien intelligences here, so I could be very wrong!
I am pretty annoyed at how quickly people jump to labeling this a delusion, when it’s something many AI experts and consciousness philosophers take seriously.
I think it is approximately correct to presume that LLM chatbots may be sentient and that we can’t tell for sure they’re not or when they’ll start being in any clean way, but also, it is “more” correct so far to presume that current chatbots are not sentient given how much of their sentient behavior is predicated on the user prompts themselves “triggering” it.
But again, of course, given how strange these minds are we may actually find that this is just a part of how sentience works for LLMs, in which case, oof.
Appreciate the remarks. Would look forward to a numerical forecast breakdown if you ever have the time to tackle it.
I’m sorry I’ve given the impression of not engaging with what was actually said. Let me try to say what I meant more clearly:
The Shifting Mortality Rates section asks: “If background mortality drops, how does that change optimal timing?” It then runs the math for a scenario where mortality plummets all the way to 1/1400 upon entering Phase 2, and shows the pause durations get somewhat longer.What it doesn’t ask is: “How likely is it that background mortality drops meaningfully in the next 20-40 years without ASI, and what does that do to the expected value calculation?”
I expect the latter because it’s actually pretty important? Like, look at these paragraphs in particular:Yet if a medical breakthrough were to emerge—and especially effective anti-aging therapies—then the optimal time to launch AGI could be pushed out considerably. In principle, such a breakthrough could come from either pre-AGI forms of AI (or specialized AGI applications that don’t require full deployment) or medical progress occurring independently of AI. Such developments are more plausible in long-timeline scenarios where AGI is not developed for several decades.
Note that for this effect to occur, it is not necessary for the improvement in background mortality to actually take place prior to or immediately upon entering Phase 2. In principle, the shift in optimal timelines could occur if an impending lowering of mortality becomes foreseeable; since this would immediately increase our expected lifespan under pre-launch conditions. For example, suppose we became confident that the rate of age-related decline will drop by 90% within 5 years (even without deploying AGI). It might then make sense to favor longer postponements—e.g. launching AGI in 50 years, when AI safety progress has brought the risk level down to a minimal level—since most of us could then still expect to be alive at that time. In this case, the 50 years of additional AI safety progress would be bought at the comparative bargain price of a death risk equivalent to waiting less than 10 years under current mortality conditions.Bostrom is explicitly acknowledging here that non-ASI life extension would be a game-changer. He says the optimal launch time “could be pushed out considerably,” even to 50 years. He acknowledges it could come from pre-AGI AI or independent medical progress. He even notes it doesn’t need to happen yet, just become foreseeable, to shift the calculus dramatically!
And then he just… moves on. He never examines the actual likelihood of it!He’s essentially saying “if this thing happened it would massively change my conclusions” without then investigating how likely it is, in a paper that is otherwise obsessively thorough about parameterizing uncertainty.
Compare this to how he handles AI safety progress. He doesn’t just say “if safety progress is fast, you should launch sooner.” He models four subphases with different rates, runs eight scenarios, builds a POMDP, computes optimal policies under uncertainty. He treats safety progress as a variable to be estimated and integrated over.
Non-ASI life extension gets two paragraphs of qualitative acknowledgment and a sensitivity table. In a paper that’s supposed to be answering “when should we launch,” the probability of the single factor he admits would “push out [timing] considerably” is left nearly unexamined, in my view.
So when a reader looks at the main tables and sees “launch ASAP” or close to it across large swaths of parameter space, that conclusion is implicitly assuming near 0% chance of non-ASI life extension. The Shifting Mortality Rates section tells you the conclusion would change if that assumption is wrong, but never really examines why he believes it is wrong, or what makes him certain or uncertain.
Which is exactly the question a paper about optimal timing from a person-affecting stance should be engaging with, in my view.
Does that make more sense?
There are a lot of things I can critique in this paper, but other people are doing that so I’m going to just bring up the bit I don’t see others mentioning.
Where are the probability calculations for potential biotech advancements as an alternative for hitting the immortality event horizon in the next 20, 30, 40 years?
You meticulously model eight scenarios of safety progress rates, three discount rates, multiple CRRA parameters, safety testing POMDPs… but treat the single most reasonable alternative pathway to saving people’s lives beside “build ASI as soon as possible and keep it in a box until it’s safe” (?!) as a sensitivity check in Tables 10-11 rather than integrating it into the main analysis with probability estimates to compare.
For a paper whose entire emotional engine runs on “170,000 people die every day, and will continue to until we launch ASI,” that seems like a glaring omission that has me scratching my head. I admit to only reading the paper over once, so maybe I missed it, but Claude didn’t find it either. And of course there’s cryonics, which doesn’t get so much as a mention.
Without those calculations, the idea that this paper was written from a “mundane person-affecting stance” seems false. It seems more accurate instead to describe it as centrally from a “60+ year old with cancer and no family” stance. That you acknowledge different demographics doesn’t matter if you’re computing their optimal timelines within the same model that handwaves away the chances of alternative life extension pathways.
I think you (and Bostrom) are failing pretty hard at distinguishing “person-affecting views” from “an individual who is over 60 years old and maybe has cancer” or similar.
If someone was actually making arguments specifically for the benefit of all the people currently alive today and next generation, I would expect very different ones from those in this paper. You could try to reasonably try to say that 96% chance of the world ending is acceptable from an 80 year old person who doesn’t care about their younger family or friends or others, but I don’t think it’s a serious argument.
For example, you would have to also do the math for the likelihood of biotech advancements that help currently living 40 year olds or 30 year olds hit the immortality event horizon, as an alternative scenario to “either race for AGI or everyone alive today dies.” If you don’t do things like that, then it doesn’t seem reasonable to argue that this is all in service of a perspective for those alive today vs “hypothetical people”… and of course the conclusion is going to be pretty badly lopsided toward taking high risks, if no other path to saving lives is seriously considered.
Separately, I think you’re straw manning pretty hard if you think Lesswrong readers don’t put serious weight on the lives of themselves, their parents, and their family members. A lot of people in this community suffer from some form of existential dread related to short timelines, and they are emotionally affected quite hard from the potential loss of their lives, and their family’s lives, and their children’s lives… not some abstract notion of “far future people.” That is often a part of their intellectual calculations and posts, but it would be a huge mistake to assume it’s the center of their lived emotional experience.
I’ve added guilt and shame to the article :)
I think it might be reasonable to distinguish the physiological reactions of emotions from the felt-sense of the emotions themselves; some biological reactions from emotions are probably leftover evolutionary traits from pre-homo-sapiens.
But I think you’re presuming too much by taking note of how inconvenient the physical symptoms are in modern contexts versus past ones. If being anxious->slippery palms->worse spear throws were really that detrimental to hunting, shouldn’t we expect it would have been selected against?
Notice your confusion, ask the next question: do sweaty palms make it harder to throw spears, or other natural objects? Smooth, artificially man made objects, yes. But porous objects like rocks and wood, quite the opposite! If you need to throw a spear or climb a tree, turns out dry hands are worse than slightly damp ones!
Is this sufficient justification for why our palms getting sweaty when anxious is a good thing? Maybe not. Maybe it’s a just-so story that is true for one particular instance or scenario but false for all the other times people get anxious and have sweaty palms. Certainly if your hands get too sweaty this might wrap back around to making things worse.
But I think you presume too much if you take for granted that the biological reactions of emotions are were not advantageous for the environment and situations we evolved in. Saying that you wish you didn’t have them now is reasonable, but I don’t think we should take for granted that we can always select out the unhelpful biological effects without losing something helpful in the process.
You’re totally right that Anxiety fires when we care about things—but I know that I care about this thing already—the Anxiety just makes me way more likely to fail.
I’m curious what you mean by you “know” you care about the thing already. Do you know this because of another emotion, or because of something else entirely? Is this the case for everything you care about that might go wrong?
Donated another $1,000, same reasoning as last year, which I’ll repost from the comment I made back then. In addition, since last year I ran an event myself at Lighthaven, WARP 2025, and it was a fantastic venue to work with. Having a place like this in the community makes a huge difference, and it feels important to do what I can to help it continue to exist for all of us.
I just donated $1,000. This is not a minor amount for me, and I almost just donated $10 as suggested in Shoshannah’s comment, but I knew I could donate that much without thought or effort, and I wanted to really put at least some effort into this, after seeing how much obvious effort Oliver and others at Lesswrong have been putting in.
My decision process was as follows:First, I dealt with my risk aversion/loss aversion/flinch response to giving large sums of money away. This took a couple minutes, much faster than it used to be thanks to things like my Season of Wealth a couple years ago, but felt like a mildly sharp object jiggling around in my chest until I smoothed it out with reminders of how much money I make these days compared to the relatively poor upbringing I had and the not-particularly-high salary I made for the first ~decade of my adult life.
Second, I thought of how much I value Lesswrong and Lighthaven existing in the world as a vague thing. Impersonally, not in the ways they have affected me, just like… worlds-with-these-people-doing-this-thing-in-it vs worlds-without. This got me up to a feeling of more than double what I wanted to give, somewhere around 25ish.
Third, I thought about how much value I personally have gained from Lesswrong and Lighthaven. I cannot really put a number on this. It’s hard to disentangle the value from all the various sources in the rationality space, and the people who posts on LW and attended Lighthaven events. This ballooned the amount to something extremely hard to measure. Far more than $100, but probably less than 10,000?
Fourth, I dealt with the flinch-response again. 10,000 is a lot for me. I lost more than that due to FTX’s collapse even before the clawback stress started, and that took a bit of time to stop feeling internal jabs over. A few subsections needed dealing with; what if I have an emergency and need lots of money? What if my hypothetical future wife or kids do? Would I regret donating then? This bumped me way back down to the hundreds range.
Fifth, I thought about how I would feel if I woke up today and instead of reading this post, I read a post saying that they had to shut down Lighthaven, and maybe even LessWrong, due to lack of funding. How much I would regret not having donated money, even if it didn’t end up helping. I’m still quite sad that we lost Wytham, and would pay money to retroactively try to save it if I could. This brought me up to something like $3-500.
Sixth, I confronted the niggling thought of “hopefully someone out there will donate enough that my contribution will not really matter, so maybe I don’t even need to really donate much at all?” This thought felt bad, and I had a brief chat with my parts, thanking my internal pragmatism for its role in ensuring we’re not being wasteful before exploring together if this is the sort of person we want to be when other people might need us. After that conversation was over the number had stabilized around 500.
Seventh, I thought about the social signal if I say I donated a lot and how this might encourage others to donate more too, effectively increasing the amount Lesswrong gets, and decided this didn’t really affect much. Maybe a minor effect toward increasing, but nothing noticeable.
Eighth, I thought about the impact to the world re: Alignment. I felt the black hole there, the potential infinite abyss that I could throw my savings and life into and probably not get any useful effect out of, and spent some time with that before examining it again and feeling like another few hundred may not “make sense” in one direction or the other, but felt better than not doing it.
And ninth, I finally thought about the individuals working at Lighthaven that I know. How much do I trust them? How much do I want them to feel supported and motivated and cared for by the community they’re contributing so much to?
By the end of that I was around 8-900 and I thought, fuck it, I’ve made stupider financial decisions than an extra hundred bucks for a fancy T-shirt, and nice round numbers are nice and round.
Thank you all for all you do. I hope this helps.
Thanks! I haven’t seen 500 Days of Summer but have considered making some film review youtube channel that dissects romance films with this lens, so if I do I’ll keep that one in mind :)
No apologies necessary, it’s possible I wasn’t clear enough!
My main point is that the orthogonality thesis applies to humans too: intelligence and values are distinct things. To judge someone’s actions as irrational, you need to actually understand their values and preferences. If you think they shouldn’t do something because the tradeoff is too high, and they acknowledge the tradeoff but want to do it anyway, that may just reveal preferences different from yours, not necessaril irrationality.
This is a good point, but the actual difference in the scenario is that ice cream is not meth ;P I think it is actually meaningful to notice that, while there may in fact be good reasons not to have ice cream, there are many, much stronger reasons not to do meth, and Bryce bringing those up are much more likely to dissuade Ash, and if they don’t, Ash is much more irrational if Ash chooses to do it anyway… but that choice is still independent from wanting meth.
Your version of Bryce is doing a less defensible thing, while mine is being more reasonable, while still being wrong, and I think that’s the important point I’m making.
Ah, yeah, see again my emphasis that I did not name this article “Emotions Are Good” :P
If you pick scenarios where people can find other emotions by which they end up doing the Morally Good and Personally Optimal thing… yeah, envy isn’t needed there.
But my claim is there are situations where people are driven by envy to do things that make their liklihood to survive and thrive better than if they had not felt it. If you disagree with that, this is what the article is trying to accomplish as a step 1, and integration happens after that.
But none of that requires “endorsement” in the way you seem(?) to mean it. Envy is not Nice. To put it in another frame, it is MtG: Black, and the value it brings to the table needs to be understood seperately from “is it good/altruistic/endorsed.”
Does that make sense?
It is certainly not my argument, nor implied by the example, that the only reason someone would do something is because of Jealousy or Envy. Things like pleasure and admiration motivate us as well, but most people don’t wonder why we have them or wish they would go away.
I hope I didn’t imply somehow that there are no other reasons people might do things besides envy or jealousy!
“You haven’t actually offered me a better alternative” sounds like a failure on your parents’ parts, or a failure of imagination on your 15-year-old-self’s part. Which happens fairly often, and is a separate thing about the preferences themselves being irrational. Many people would be happy with a life of leisure and no responsibilities, and the desire for that isn’t irrational at all. It’s important to be educated about the long-term consequences of it specifically because that’s what helps people feel motivated to do something more robust to their future self’s preferences.
I’d also note that “didn’t want to do schoolwork” is different from “didn’t want to go to school at all,” which yes has legal consequences that rather drastically changes the outcome.
I am pretty sure moderators do not look over every Lesswrong comment before it’s published.