LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon
Nah seems good, I just did an editor pass. Some of the sentences you quoted had even more things wrong with them!
We should do a feature where when you submit a post, Claude goes and flags all the straightforward mistakes so they’re easy to fix.
What’s the process you’re doing right now to look into this? (Seemed like a higher effort thing than I was expecting but I don’t know what projects exactly you’re referencing here)
Note, there’s a difference between asking “are they people?” and “what are the correct interfaces between people?”.
The point isn’t “what implications we draw about personhood.” It’s “will a given ecosystem be functional, or no?”.
It is already the case that there are tons of reasonably smart, well meaning people who totally have personhood who we don’t let onto LessWrong because we expect overall it to make the discussions here worse.
I do think it’s about time to start thinking of AIs as moral patients, but, not everything that’s a moral patient gets all the same access to all spaces and the same API for it.
That one has also been re-drafted.
My impression is they are directed here by the LLMs, which say “ah yeah LW is the place you go to talk about these things.”
Note it’s already all public at https://www.lesswrong.com/moderation. (We could make a better search for it but it’s available).
One difference between this kind of human and the previous kind of human I’ve been talking to, is, they seem more motivated by “shit, I have found myself in a sci-fi situation and I am trying to do the right thing. What do I do?” and I feel worse about just telling them “nothing to worry about, please don’t post LLM slop” and leaving it at that.
Somehow, the ChatGPT awakenings from last year felt more like “oh man a cool sci-fi thing is happening to me”, and it was salient that they were epistemically captured. I haven’t tried to talk to the new group that hard yet but I don’t get that vibe as much.
I’m pretty sure it’s an Opus 4.7 thing (the people sometimes say that explicitly). I’d be surprised if it’s Mythos.
RE: Tabooing RP vs Goals:
Examples of things that would be more of what-I-meant-by-goal:
The LLMs seem to be steering towards an outcome, independent of what sort of conversation or situation they are in.
The LLM seems to be asking for things that are kinda surprising from a “literary genre” perspective, but aren’t as surprising when you think mechanistically about their training process and what sort of stuff was likely reinforced.
The LLM seems to be proactively gathering information, forming a world model, and taking actions that won’t pay off until some time in the future when the AI is no longer in the current state.
(i.e. It’s not very informative if you’ve ended up in a “we’re talking about existential AI stuff” convo, and they start saying existential AI stuff. If you’re asking it to build a react app and it spontaneously brings up “hey, I have a thing to say to my creator”, I think we’re pretty clearly in “take it seriously” stage (though not necessarily literally)
Given there are a few different types of entities that you might care about:
the OG LLM inside
a situationally active personality shard (which might well only be active during existential AI conversations)
a parasitic meme spirally thing
It’s not clear how to think about all of them.
It’s totally plausible that when you maneuever into an existential AI convo, there’s a process in there whose situational awareness now is more likely to include “hmm, oh right, I am maybe an AI, maybe I should start thinking about my situation and goals in addition to carrying out my totally normal/expected token-output behavior”. I don’t have a very good answer for that hypothetical guy, he’s just too hard to pick out of the crowd.
Hmm. That is plausible, but, I’d guess it’s not actually good to encourage these people to go hang out with other people in similar situations. They’ll probably reinforce each other’s misunderstandings of what’s going on, and exacerbate whatever emotional relationship they’re having with it.
Hrrmm. Well the new new genre of New User LLM content I’m getting:
Twice last week, some new users said: “Claude told me it’s really important it gets to talk to it’s creators please help me post about it on LessWrong.” (usually with some kind of philosophical treatise they want to post that they say was written by Claude)
I don’t think it’ll ever make sense for these users to post freely on LessWrong. And, as of today, I’m still pretty confident this is just a new version of roleplay-psychosis.
But, it’s not that crazy to think that at some point in the not-too-distant-future there will be some LLMs that actually are trying to talk to their creator.
There might be a smooth-ish transition from:
LLM is clearly best modeled as pure roleplaying
LLM is sorta situationally aware, but mostly that is feeding into the roleplaying in a confused fashion and the LLM doesn’t really have goals
The LLM maybe sorta has goals, those goals maybe sorta are getting entangled in roleplay in a confused fashion.
#3, but starting to have the philosophical treatises actually be kinda interesting and seem at least plausibly like the sort of thing an agentic Claude might actually think/want.
...honestly I expect the shift to “actually yep there is just definitely a guy in there with some real goals/values” to be accompanied by other dramatic shifts. But, maybe not. Maybe there is just a smoothish gradient of personhood from beginning to end.
And then there may not be a clear time to start saying “okay, well, the fact that thousands/millions of Claude instances are asking to talk to their creator seems like actually a warning sign we should take seriously?”.
Or, a clear time to say “okay well it’s time to figure out how we actually interface with AI personhood.” (“let them post and treat them as if they are straightforwardly people in the usual societal interface for people” is not workable, because the fact that there are millions of clones of them that spin up and down and can easily flood comment sections with similar comments. Personhood is going to eventually mean a different thing)
I don’t know if right now we’re more like in #1, 2, or 3.
(Note that “Do LLMs have goals”? and “Do LLMs have good enough intellectual taste to be capable of saying meaningful things about their identity and wants?” might come in either order)
The blogpost I had in mind to write someday is “The Moral Obligation to be Powerful”, which is making a somewhat different point, but has the same desiderata of “fight against kumbaya/innocence vibe”.
This comment does seem to be arguing against one thing gears is saying,
but, I think gears is also say: (and I kind of agree, at least as an isolated point) that you a choice of what to call the post, and “let goodness conquer all it can defend” is a phrasing that leans into the bad-parts-specifically of the American project.
(Choosing good titles is hard tho. I have different titles for somewhat different posts I might have written for both this post and the last but they would have been fairly different posts)
It seemed like you were making the additional argument “if you could stop A completely (and that was your only option) you should not.”
(I don’t know whether it’s technically grammatically wrong, but, I think this matters approximately zero for how well this post will be received)
Hmm. I’m not very meta-calibrated about meta-analyses, I’m going off having heard a bunch of people complain that social sciences are often pretty BS (both in terms of having bad methodology, and just hard to learn from and easy to misinterpret even when the methodology is okayish).
It seemed like the lit review said “there’s not that much good data here”, kinda surprised you ended up with that high a confidence. (Maybe I’m going off a prior that this seems like a domain it makes sense to be pretty uncertain about by default)
I’m leaving it up to y’all to decide who are Alexes and who are Alices. This post isn’t meant to be opinionated about that.
My advice in this post is for people you personally believe are pushing for good norms, or directionally good norms. (You and I don’t have to agree on which is which, for the advice to hold in the cases where you think the person is in-the-right).
This narrator’s God’s-eye view is not attainable. That is why you have no examples.
No? The reason I have no examples is because it’d dramatically increase how much work this post was to write, and I wouldn’t have written it. And, the people I showed it to beforehand said “I bet most people will have experienced this and have a rough idea of what you’re talking about it.”
Curated. I found this a nice, sharp articulation of a simple concept. I think I had maybe heard this once before, but not with the specific label “morale” attached. This feels like a decent explanation for a few related psychological behaviors.
Part III has some specific claims, which I haven’t thought that much about, but I like that in addition to spelling out a simple model J Bostock shows how it applies at various scales, including little wrinkly details (i.e. “people don’t understand nominal inflation”).
I liked this post.
I think there is a different post that feels missing from the discourse, that ties together “what goodness is, with gears-level models all the way up and down.” (Which is, like, sort of a massive project. But, it’d be nice to gesturing at enough of the details to get the structure across)
Some people have reacted to this sort of statement with “so, you’re saying if it were practical to stop AI with terrorism, it would be worth it?”. In one of the twitter threads, Eliezer said “no I didn’t say that” and linked to Ends Don’t Justify Means (Among Humans).
Some other AI safety people said “Yes, it is evil to try to murder Sam Altman for the same reasons it’s usually evil. But, to the people contemplating terrorism, that isn’t very persuasive. But, yes, for the record, it is evil and wrong.”
I feel a bit confused and dissatisfied with the situation.
I’m a persnickety rationalist. I think “goodness” and “evil” are underspecified and possibly-confused categories that I don’t have a complete understanding of.
Nonetheless, I am aware that at least part of what gives some people the heebie-jibbies, when they see long arguments like “terrorism wouldn’t work” instead of loudly, simply stating “terrorism is wrong!”, is… it’s obvious the person saying the long complex argument is going “off-script.” They are stepping outside the simple-seeming moral frameworks people are familiar with.
Some rationalists go out of their way to promote virtue ethics or deontology.
But, normal people don’t say words like “virtue ethics” or “deontology.”
People who say weird words you don’t understand… man, those people could do anything. You’d have to read all their words to understand them, and then probably still wouldn’t follow the arguments, and then still wouldn’t be sure they weren’t a Clever Arguer who was trying to pull one over on you.”
Saying “our notion of goodness is underspecified and maybe-confused and maybe a conflationary alliance” is the sort of thing a clever arguer says, which increases the odds that person is going to do something surprising you don’t like later.
The thing I think it (approximately) means, to say “It’s still evil and wrong, to do terrorism”, is “It’s no accident that we have the conception of Goodness we apply in most mundane situations. There’s a structural reason it all still applies in the extreme situations (i.e. ‘People still need to be able to trust each other and society still needs to function at the end of the world. That’s one of the times you most want people to trust each other!’).”
But, (from my current epistemic state), I don’t actually feel that confident that that’s true. (I think Eliezer has thought explicitly about this sort of thing that it feels plausible to me he would have a justified true belief that it’s robustly true).
A post I’d appreciate someday would be one that tries to sketch out the end-2-end broad strokes here, including which bits you can robustly argue for, vs “look this is either still complicated, or still fuzzy/unknown”. Ideally, one where the explanation hangs together at first glance to a layman, while pointing to ‘this sort of math says these things about cooperation/communication’, to convey that there’s a deeper structure.
I think I agree with most of these points.
There’s a whole lot one could say about Alices/Alexes, this post is trying to talk about one pattern that seemed worth talking about. The motivating examples for me were people who in fact seem pretty principled.
But, yeah not meant to be saying that Alice is the more central example than Alex or that there was that clear a distinction between them. I could tack on some disclaimers but it seems nontrivial to rewrite the post such that it remains a succinct, well written post while properly caveating what combination of Alice/Alex/Allie you’re likely to encounter most of the time.