LessWrong team member / moderator. I’ve been a LessWrong organizer since 2011, with roughly equal focus on the cultural, practical and intellectual aspects of the community. My first project was creating the Secular Solstice and helping groups across the world run their own version of it. More recently I’ve been interested in improving my own epistemic standards and helping others to do so as well.
Raemon(Raymond Arnold)
Moderation action on Said
(See also: Ruby’s moderator warning for Duncan)
I’ve been thinking for a week, and trying to sanity-check whether there are actual good examples of Said doing-the-thing-I’ve-complained-about, rather than “I formed a stereotype of Said and pattern match to it too quickly”, and such.
I think Said is a pretty confusing case though. I’m going to lay out my current thinking here, in a number of comments, and I expect at least a few more days of discussion as the LessWrong community digests this. I’ve pinned this post to the top of the frontpage for the day so users who weren’t following the discussion can decide whether to weigh in.
Here’s a quick overview of how I think about Said moderation:
Re: Recent Duncan Conflict.
I think he did some moderation-worthy things in the recent conflict with Duncan, but a) so did Duncan, and I think there’s a “it takes two-to-tango” aspect of demon threads, b) at most, those’d result in me giving one or both of them a 1-week ban and then calling it a day. I basically endorse Vaniver’s take on some object level stuff. I have a bit more to say but not much.
Overall pattern.
I think Said’s overall pattern of commenting includes a mix of “subtly enforcing norms that aren’t actual LW site norms (see below)”, “being pretty costly to interact with, in a way feels particularly ‘like a trap’”, and “in at least some domains, being consistently not-very-correct in his implied criticisms”. I think each of those things are at least a little bad in isolation (though not necessarily moderation-worthy). But I think they become worse than the sum-of-their-parts. If he was consistently doing the entire pattern, I would either ban him, or invent new tools to either alleviate-the-cost or tax-the-behavior in a less heavyhanded way.
Not sufficient corresponding upside
I’d be a lot less wary of the previous pattern if I felt like Said was also contributing significantly more value to LessWrong. [Edit: I do, to be clear, think Said has contributed significant value, both in terms of keeping the spirit of the sequences alive in the world ala readthesequences.com, and through being a voice with a relatively rare (these days) perspective that keeps us honest in important ways. But I think the costs are, in fact, really high, and I think the object level value isn’t enough to fully counterbalance it]
Prior discussion and warnings.
We’ve had numerous discussions with Said about this (I think we’ve easily spent 100+ hours of moderator-time on it, and probably more like 200), including an explicit moderation warning.
Few recent problematic pattern instances.
That all said, prior to this ~month’s conflict with Duncan, I don’t have a confident belief that Said has recently strongly embodied the pattern I’m worried about. I think it was more common ~5 years ago. I cut Said some slack for the convo with Duncan because I think Duncan is kind of frustrating to argue with.
THAT said, I think it’s crept up at least somewhat occasionally in the past 3 years, and having to evaluate whether it’s creeping up to an unacceptable level is fairly costly.
THAT THAT said, I do appreciate that the first time we gave him an explicit moderation notice, I don’t think we had any problems for ~3 years afterwards.
Strong(ish) statement of intent
Said’s made a number of comments that make me think he would still be doing a pattern I consider problematic if the opportunity arose. I think he’ll follow the letter of the law if we give it to him, but it’s difficult to specify a letter-of-the-law that does the thing I care about.
A thing that is quite important to me is that users feel comfortable ignoring Said if they don’t think he’s productive to engage with. (See below for more thoughts on this). One reason this is difficult is that it’s hard to establish common knowledge about it among authors. Another reason is that I think Said’s conversational patterns have the effect of making authors and other commenters feel obliged to engage with him (but, this is pretty hard to judge in a clear-cut way)
For now, after a bunch of discussion with other moderators, reading the thread-so-far, and talking with various advisors – my current call is giving Said a rate limit of 3-comments-per-post-per-week. See this post on the general philosophy of rate limiting as a moderation tool we’re experimenting with. I think there’s a decent chance we’ll ship some new features soon that make this actually a bit more lenient, but don’t want to promise that at the moment.
I am not very confident in this call, and am open to more counterarguments here, from Said or others. I’ll talk more about some of the reasoning here at the end of this comment. But I want to start by laying out some more background reasoning for the entire moderation decision.
In particular, if either Said makes a case that he can obey the spirit of “don’t imply people have an obligation to engage with his comments”; or, someone suggests a letter-of-the-law that actually accomplishes the thing I’m aiming at in a more clear-cut way, I’d feel fairly good about revoking the rate-limit.
(Note: one counterproposal I’ve seen is to develop a rate-limit based entirely on karma rather than moderator judgment, and that it is better to do this than to have moderators make individual judgment calls about specific users. I do think this idea has merit, although it’s hard to build. I have more to say about it at the end)
Said Patterns
3 years ago Habryka summarized a pattern we’d seen a lot:
The usual pattern of Said’s comments as I experience them has been (and I think this would be reasonably straightforward to verify):
Said makes a highly upvoted comment asking a question, usually implicitly pointing out something that is unclear to many in the post
Author makes a reasonably highly upvoted reply
Said says that the explanation was basically completely useless to him, this often gets some upvotes, but drastically less than the top-level question
Author tries to clarify some more, this gets much fewer upvotes than the original reply
Said expresses more confusion, this usually gets very few upvotes
More explanations from the author, almost no upvotes
Said expresses more confusion, often being downvoted and the author and others expressing frustration
I think the most central of this is in this thread on circling, where AFAICT Said asked for examples of some situations where social manipulation is “good.” Qiaochu and Sarah Constantin offer some examples. Said responds to both of them by questioning their examples and doubting their experience in a way that is pretty frustrating to respond to (and in the Sarah case seemed to me like a central example of Said missing the point, and the evo-psych argument not even making sense in context, which makes me distrust his taste on these matters). [1, 2]
I don’t actually remember more examples of that pattern offhand. I might be persuaded that I overupdated on some early examples. But after thinking a few days, I think a cruxy piece of evidence on how I think it makes sense to moderate Said is this comment from ~3 years ago:
There is always an obligation by any author to respond to anyone’s comment along these lines*. If no response is provided to (what ought rightly to be) simple requests for clarification (such as requests to, at least roughly, define or explain an ambiguous or questionable term, or requests for examples of some purported phenomenon), the author should be interpreted as ignorant. These are not artifacts of my particular commenting style, nor are they unfortunate-but-erroneous implications—they are normatively correct general principles.
*where I think “these lines” means “asking for examples”, “asking people to define terms,” etc.For completeness, Said later elaborates:
Where does that obligation come from?
I should clarify, first of all, that the obligation by the author to respond to the comment is not legalistically specific. By this I mean that it can be satisfied in any of a number of ways; a literal reply-to-comment is just one of them. Others include:
Mentioning the comment in a subsequent post (“In the comments on yesterday’s post, reader so-and-so asked such-and-such a question. And I now reply thus: …”).
Linking to one’s post or comment elsewhere which constitutes an answer to the question.
Someone else linking to a post or comment elsewhere (by the OP) which constitutes an answer to the question.
Someone else answering the question in the OP’s stead (and the OP giving some indication that this answer is endorsed).
Answering an identical, or very similar, question elsewhere (and someone providing a link or citation).
In short, I’m not saying that there’s a specific obligation for a post author to post a reply comment, using the Less Wrong forum software, directly to any given comment along the lines I describe.
Habryka and Said discussed it at length at the time.
I want to reiterate that I think asking for examples is fine (and would say the same thing for questions like “what do you mean by ‘spirituality’?” or whatnot). I agree that a) authors generally should try to provide examples in the first place, b) if they don’t respond to questions about examples, that’s bayesian evidence about whether their idea will ground out into something real. I’m fairly happy with clone of saturn’s variation on Said’s statement, that if the author can’t provide examples, “the post should be regarded as less trustworthy” (as opposed to “author should be interpreted as ignorant”), and gwern’s note that if they can’t, they should forthrightly admit “Oh, I don’t have any yet, this is speculative, so YMMV”.
The thing I object fairly strongly to is “there is an obligation on the part of the author to respond.”
I definitely don’t think there’s a social obligation, and I don’t think most LessWrongers think that. (I’m not sure if Said meant to imply that). Insofar as he means there’s a bayesian obligation-in-the-laws-of-observation/inference, I weakly agree but think he overstates it: there’s a lot of reasons an author might not respond (“belief that a given conversation won’t be productive,” “volume of such comments,” “trying to have a 202 conversation and not being interested in 101 objections,” and simple opportunity cost).
From a practical ‘things that the LessWrong culture should socially encourage people to do’, I liked Vladimir’s point that:
My guess is that people should be rewarded for ignoring criticism they want to ignore, it should be convenient for them to do so. [...] This way authors are less motivated to take steps that discourage criticism (including steps such as not writing things). Criticism should remain convenient, not costly, and directly associated with the criticized thing (instead of getting pushed to be published elsewhere).
i.e. I want there to be good criticism on LW, and think that people feeling free to ignore criticism encourages more good criticism, in part by encouraging more posts and engagement.
It’s been a few years and I don’t know that Said still endorses the obligation phrasing, but much of my objection to Said’s individual commenting stylistic choices has a lot to do with reinforcing this feeling of obligation. I also think (less confidently) that they get an impression that Said thinks if an author hasn’t answered a question to his satisfaction (as an example of a reasonable median LW user), they should feel an [social] obligation to succeed at that.
Whether he intends this or not, I think it’s an impression that comes across, and which exerts social pressure, and I think this has a significant negative effect on the site.
I’m a bit confused about how to think about “prescribed norms” vs “good ideas that get selected on organically.” In a previous post Vladmir_Nesov argues that prescribing norms generally doesn’t make sense. Habryka had a similar take yesterday when I spoke with him. I’m not sure I agree (and some of my previous language here has probably assumed a somewhat more prescriptivist/top-down approach to moderating LessWrong that I may end up disendorsing after chatting more with Habryka)
But even in a more organic approach to moderation, I, Habryka and Ruby think it’s pretty reasonable for moderators to take action to prevent Said from implying that there’s some kind of norm here and exerting pressure around it on other people’s comment sections, when, AFAICT, there is no consensus of such a norm. I predict a majority of LessWrong members would not agree with that norm, either on normative-Bayesian terms nor consequentialist social-norm-design terms. (To be clear I think many people just haven’t thought about it at all, but expect them to at least weakly disagree when exposed to the arguments. “What is the actual collective endorsed position of the LW commentariat” is somewhat cruxy for me here)
Rate-limit decision reasoning
If this was our first (or second or third) argument with Said over this, I’d think stating this clearly and giving him a warning would be a reasonable next action. Given that we’ve been intermittently been arguing about this for 5 years, spending a hundred+ hours of mod time discussing it with him, it feels more reasonable to move to an ultimatum of “somehow, Said needs to stop exerting this pressure in other people’s comment threads, or moderators will take some kind of significant action to either limit the damage or impose a tax on it.”
If we were limited to our existing moderator tools, I would think it reasonable to ban him. But we are in the middle of setting up a variety of rate limiting tools to generally give mods more flexibility, and avoid being heavier-handed than we need to be.
I’m fairly open to a variety of options here. FWIW, I am interested in what Said actually prefers here. (I expect it is not a very fun conversation to be asked by the people-in-power “which way of constraining you from doing the thing you think is right seems least-bad to you?”, but, insofar as Said or others have an opinion on that I am interested)
I am interested in building a automated tool that detects demon threads and rate limits people based on voting patterns.. I most likely want to try to build such a tool regardless of what call we make on Said, and if I had a working version of such a tool I might be pretty satisfied with using it instead. My primary cruxes are
a) I think it’s a lot harder to build and I’m not sure we can succeed,
b) I do just think it’s okay for moderators to make judgment calls about individual users based on longterm trends. That’s sort of what mods are for. (I do think for established users it’s important for this process to be fairly costly and subjected to public scrutiny)But for now, after chatting with Oli and Ruby and Robert, I’m implementing the 3-comments-per-post-per-week rule for Said. If we end up having time to build/validate an organic karma-based rate limit that solves the problem I’m worried about here, I might switch to that. Meanwhile some additional features I haven’t shipped yet, which I can’t make promises about, but which I personally think would be god to ship soon include:
There’s at least a boolean flag for individual posts so authors can allow “rate limited people can comment freely”, and probably also a user-setting for this. Another possibility is a user-specific whitelist, but that’s a bit more complicated and I’m not sure if there’s anyone who would want that who wouldn’t want the simpler option.
I’d ideally have this flag set on this post, and probably on other moderation posts written by admins.
Rate-limited users in a given comment section have a small icon that lets you know they’re rate-limited, so you have reasonable expectations of when they can reply.
Updating the /moderation page to list rate limited users, ideally with some kind of reason / moderation-warning.
Updating rate limits to ensure that users can comment as much as they want on their own posts (we made a PR for this change a week ago and haven’t shipped it yet largely because this moderation decision took a lot of time)
Some reasons for this-specific-rate-limit rather than alternatives are:
3 comments within a week is enough for an initial back-and-forth where Said asks questions or makes a critique, the author responds, Said responds-to-the-response. (i.e. allowing the 4 layers of intellectual conversation, and getting the parts of Said comments that most people agree are valuable)
It caps the conversation out before it can spiral into unproductive escalatory thread.
It signals culturally that the problem here isn’t about initial requests for examples or criticisms, it’s about the pattern that tends to play out deeper in threads. I think it’s useful for this to be legible both to authors engaging with Said, and other comments inferring site norms (i.e. some amount of Socrates is good, too much can cause problems)
If 3 comments isn’t enough to fully resolve a conversation, it’s still possible to follow up eventually.
Said can still write top level posts arguing for norms that he thinks would be better, or arguing about specific posts that he thinks are problematic.
That all said, the idea of using rate-limits as a mod-tool is pretty new, I’m not actually sure how it’ll play out. Again, I’m open to alternatives. (And again, see this post for more thoughts on rate limiting)
Feel free to argue with this decision. And again, in particular, if Said makes a case that he either can obey the spirit of “don’t imply people have an obligation to engage with your comments”, or someone can suggest a letter-of-the-law that actually accomplishes the thing I’m aiming at in a more clear-cut way that Said thinks he can follow, I’d feel fairly good about revoking the rate-limit.
- 26 Apr 2023 23:22 UTC; 22 points) 's comment on Moderation notes re: recent Said/Duncan threads by (
- 18 May 2023 0:35 UTC; 18 points) 's comment on Thoughts on LessWrong norms, the Art of Discourse, and moderator mandate by (
- 20 May 2023 19:10 UTC; 15 points) 's comment on Thoughts on LessWrong norms, the Art of Discourse, and moderator mandate by (
- 25 Apr 2023 19:12 UTC; 3 points) 's comment on Moderation notes re: recent Said/Duncan threads by (
- 26 Apr 2023 23:52 UTC; 1 point) 's comment on Moderation notes re: recent Said/Duncan threads by (
I’m about to process the last few days worth of posts and comments. I’ll be linking to this comment as a “here are my current guesses for how to handle various moderation calls”.
- 5 Apr 2023 2:12 UTC; 7 points) 's comment on Empathy bandaid for immediate AI catastrophe by (
- 6 Apr 2023 20:17 UTC; 3 points) 's comment on Why Yudkowsky Is Wrong And What He Does Can Be More Dangerous by (
- 4 Apr 2023 23:28 UTC; 3 points) 's comment on Recontextualizing the Risks of AI in More Predictable Outcomes by (
- 6 Apr 2023 20:28 UTC; 2 points) 's comment on Williams-Beuren Syndrome: Frendly Mutations by (
I have just shipped our first draft of Inline Reacts for comments.
You can mouse over a piece of text on a comment, and a little Add React button will appear off to the right.
If you click on it, you’ll see the React palette, and you can then apply the react to that particular string of text. Once you’ve reacted, the reacted-snippet-of-text will appear with a dotted-underline while you’re moused over the comment, and it’s corresponding react-icon at the bottom of the comment will also show a dotted outline:
When you hoverover a react, it shows the inline-reacts in the hoverover, and they appear highlightd bright green on the post:
Possibilities for the future
Right now these are only enabled on this open thread. If they seem to be basically working we may give authors the option of using them.
Currently you can +1 individual inline reacts, but not −1 (it was unfortunately a lot gnarlier design-wise to implement anti-reacts for individual inline reacts). If inline reacts turn out to be useful/popular, and anti-reacting gets validated as useful, we’ll likely figure out a way to implement that.
I’d like to make the dotted-line-sections highlight their corresponding react button when you hoverover them, but that was also a bit trickier codewise.
Questions
Some particular questions I have:
how intuitive do you find this overall?
how do you feel about the current implementation of the “dotted underline”. Is it annoying to look at? It only appears while you’re mousing over the comment so it’s possible to read the comment without the react-underlines, but I wasn’t sure how that was going to feel in real life.
how are you currently feeling about anti-reacts?
This post has a number of newer users engaging, and I wanted to take a moment to note: LessWrong has some fairly subtle norms on political discussions. If your first comments on LessWrong are on political topic, moderators may be giving you a bit extra scrutiny.
In general we ask all new users to read through the sequences (aka “Rationality A-Z”). But in particular, if you’re going to comment on political topics on LessWrong, I ask you to have read through the LessWrong Political Prerequisites sequence.
Politics is a more difficult place to train your rationality. And while it is sometimes important to discuss on LW, I’m most excited to see users joining the site if they are also interested in discussing other topics.
In a world where we have reacts, do you prefer to keep agreement voting?
agree-react for “keep agreement voting”
disagree-react for “at least for an initial react experiment, get rid of agreement voting and just design the react-palette to facilitate agreement voting”
This isn’t a binding poll, but a couple people had mentioned this deeper in comment threads and I wanted to comment making the question somewhat more explicit and take the temperature of how people are feeling about it.
Here’s my best guess for overall “moderation frame”, new this week, to handle the volume of users. (Note: I’ve discussed this with other LW team members, and I think there’s rough buy-in for trying this out, but it’s still pretty early in our discussion process, other team members might end up arguing for different solutions)
I think to scale the LessWrong userbase, it’d be really helpful to shift the default assumptions of LessWrong to “users by default have a rate limit of 1-comment-per day” and “1 post per week.”
If people get somewhat upvoted, they fairly quickly increase that rate limit to either “1 comment per hour” or “~3 comments per day” (I’m not sure which is better), so they can start participating in conversations. If they get somewhat more upvoted the rate limit disappears completely.
But to preserve this, you need to be producing content that is actively upvoted. If they get downvoted (or just produce a long string of barely-upvoted comments), they go back to the 1-per-day rate limit. If they’re getting significantly downvoted, the rate limit ratchets up (to 1 per 3 days, then once per week and eventually once-per month which is essentially saying “you’re sort of banned, but you can periodically try again, and if your new comments get upvoted you’ll get your privileges restored”)
Getting the tuning here exactly right to avoid being really annoying to existing users who weren’t doing anything wrong is somewhat tricky, but a) I think there are at least some situations where I think the rules would be pretty straightforward, b) I think it’s an achievable goal to the tune the system to basically work as intended.
When users have a rate limit, they get UI elements giving them some recommendations for what to do differently. (I think it’s likely we can also build some quick-feedback buttons that moderators and some trusted users can use, so people have a bit more idea of what to do differently).
Once users have produced a multiple highly upvoted posts/comments, they get more leniency (i.e. they can have a larger string of downvotes or longer non-upvoted back-and-forths before getting rate limited).
If we were starting a forum from scratch with this sort of design at it’s foundation, I think this could feel more like a positive thing (kinda like a videogame incentivizing good discussion and idea-generation, with built in self-moderation).
Since we’re not starting from scratch, I do expect this to feel pretty jarring and unfair to people. I think this is sad, but, I think some kind of change is necessary and we just have to pay the costs somewhere.
My model of @Vladimir_Nesov pops up to warn about negative selection here (I’m not sure whether he thinks rate-limiting is as risky as banning, for negative-selection reasons. It certainly still will cause some people to bounce off. I definitely see risks with negative selection punishing variance, but even the current number of mediocre comments has IMO been pretty bad for lesswrong, the growing amount I’m expecting in the coming year seems even worse, and I’m not sure what else to do.
Moderators will evaluate your comment before it appears publicly, for criteria including:
Understand rationality fundamentals. Try to reason probabilistically, get curious about where you might be wrong, avoid arguing over definitions, etc. Moderators may reject content that seems to be making reasoning mistakes covered in The Sequences.
Be careful when making assumptions about people’s motivations. Try to argue about ideas rather than people.
Easy to engage/argue with. If you disagree, try to state your reasoning and what would change your mind. Make concrete predictions.
Understand the context of the parent post. Many posts are assuming some background knowledge, and most post authors don’t want to rehash a lot of 101 debates every time they’re building off an existing argument. If your comment seems to be ignoring background context (or simply ignoring/misunderstanding the text of the original post), moderators may ask you to post it elsewhere.
You can read more advice about how to make a good first comment in the new user’s guide.
- 22 Apr 2023 7:49 UTC; 4 points) 's comment on LessWrong moderation messaging container by (
Note: LessWrong has high/specific standards for first posts
We think of LessWrong as somewhere in between a forum and a university or academic journal. Your first post is a bit like an application to said university. Established users can write on a variety of topics, but new users should focus on communicating clear, succinct models/evidence/arguments that are relevant to LessWrong. (We recommend avoiding fiction or poetic posts until you’ve gotten some upvoted object-level posts)
Understand rationality fundamentals. Try to reason probabilistically, get curious about where you might be wrong, avoid arguing over definitions, etc. Moderators may reject content that seems to be making reasoning mistakes covered in The Sequences.
Write a clear introduction. Your first couple paragraphs should make it obvious what the main point of your post is, and ideally gesture at the strongest argument for that point. Explain why your post is relevant to the LessWrong audience.
Address existing counterarguments (if applicable). We try to avoid rehashing debates that have been covered significantly. If your post seems to be ignoring important arguments that have already been made on a topic, mods may ask you to do more background reading.
AI content is held to a particularly high standard. There’s a large wave of AI content. Ideally, we’d give a lot of feedback and guidance to each individual. Unfortunately we don’t have bandwidth to do that. We’re working on some posts to give people a better sense of how to get started. Meanwhile in some cases we may ask you to do some more background reading or comment in the AI Questions Open Thread.
You can read more advice about how to make a good first post in the new user’s guide.
Mod note: this is frontpaged, which is an exception to our usual rules. See here for explanation.
Note: I currently lean towards changing the Progress Bar metric from “tagged posts over 25 karma” to 35 or 40 karma.
The original reason we went with 25 karma was an awkward compromise due to the LW2.0 karma inflation – old upvotes were only worth 1 point, now regular upvotes are worth 2 for most longtime users, and strong upvotes mean the average is more like 3-4. We haven’t gotten around to re-running the old vote history with the new vote-weighting, and that means that old (often great) posts have much lower karma than modern posts.
We plan to bring the old and new votes in sync someday, but didn’t have time to do it this week.
For modern posts, the threshold I’d have preferred to set was ~50 karma. This was roughly the equivalent of 25 back-in-the-day (hence the original metric). But I don’t really want to make people feel obligated to tag a bunch of mediocre modern posts – I’d rather taggers start shifting their efforts towards improving tag descriptions (turn stubs into full fledged A or B tier tags), and thinking about how the tag ontology fits together (i.e. are some tags duplicates? which tags are related?)
My current guess is we should set the threshold to 40, and then I’m just going to strong upvote a bunch of older posts that deserve it to bump them over the threshold.
(Meanwhile, to all the users who have doing doing tons of tagging: thanks!)
After chatting for a bit about what to do with low-quality new posts and comments, while being transparent and inspectably fair, the LW Team is currently somewhat optimistic about adding a section tolesswrong.com/moderationwhich lists all comments/posts that we’ve rejected for quality.We haven’t built it yet, so for immediate future we’ll just be strong downvoting content that doesn’t meet our quality bar. And for immediate future, if existing users in good standing want to defend particular pieces as worth inclusion they can do so here.This isnota place for users who submitted rejected content to write an appeal (they can do that via PM, although we don’t promise to reply since often we were just pretty confident in our take and the user hasn’t offered new information), and I’ll be deleting such comments that appear here.(Is this maximally transparent? No. But, consider that it’s still dramatically more transparent than a university or journal)j/k I just tried this for 5 minutes and a) I don’t actually want to approve users to make new posts (which is necessary currently to make their post appear), b) there’s no current transparent solution that isn’t a giant pain. So, not doing this for now, but we’ll hopefully build a Rejected Content section at some point.
Today’s UI tweaks include the ability to try out different layouts for the React Palette. You can click the icons in the top of the React Palette and see options like:
Default:
Icons only:
Icons / name grid
Mixed:
(the idealized version of this one probably shows you reacts in the order of frequency-that-you-use-them)
In the past few weeks I’ve noticed a significant change in the Overton window of what seems possible to talk about. I think the broad strokes of this article seem basically right, and I agree with most of the details.
I don’t expect this to immediately cause AI labs or world governments to join hands and execute a sensibly-executed-moratorium. But I’m hopeful about it paving the way for the next steps towards it. I like that this article, while making an extremely huge ask of the world, spells out exactly how huge an ask is actually needed.
Many people on hackernews seemed suspicious of the FLI Open Letter because it looks superficially like the losers in a race trying to gain a local political advantage. I like that Eliezer’s piece makes it more clear that it’s not about that.
I do still plan to sign the FLI Open Letter. If a better open letter comes along, making an ask that is more complete and concrete, I’d sign that as well. I think it’s okay to sign open letters that aren’t exactly the thing you want to help build momentum and common knowledge of what people think. (I think not-signing-the-letter while arguing for what better letter should be written, similar to what Eliezer did here, also seems like a fine strategy for common knowledge building)
I’d be most interested in an open letter for something like a conditional-commitment (i.e. kickstarter mechanic) for shutting down AI programs IFF some critical mass of other countries and companies shut down AI programs, which states something like:
It’d be good if all major governments and AI labs agreed to pause capabilities research indefinitely while we make progress on existential safety issues.
Doing this successfully is a complex operation, and requires solving novel technological and political challenges. We agree it’d be very hard, but nonetheless is one of the most important things for humanity to collectively try to do. Business-as-usual politics will not be sufficient.
This is not claiming it’d necessarily be good for any one lab to pause unilaterally, but we all agree that if there was a major worldwide plan to pause AI development, we would support that plan.
If safe AGI could be developed, it’d be extremely valuable for humanity. We’re not trying to stop progress, we’re just trying to make sure we actually achieve progress, rather than causing catastrophe.
I think that’s something that several leading AI lab leaders seem like they should basically support (given their other stated views)
- 4 Apr 2023 15:52 UTC; 1 point) 's comment on Pausing AI Developments Isn’t Enough. We Need to Shut it All Down by Eliezer Yudkowsky by (
Something that seems fairly important is the ability to mark your own answer before seeing the others, to avoid anchoring. (I don’t know that everyone should be forced to do this but it seems useful to at least have the option. I noticed myself getting heavily anchored by some of the current question’s existing answers)
This is a pretty complex epistemic/social situation. I care a lot about our community having some kind of good process of aggregating information, allowing individuals to integrate it, and update, and decide what to do with it.
I think a lot of disagreements in the comments here and on EAF stem from people having an implicit assumption that the conversation here is about “should [any particular person in this article] be socially punished?”. In my preferred world, before you get to that phase there should be at least some period focused on “information aggregation and Original Seeing.”
It’s pretty tricky, since in the default, world, “social punishment?” is indeed the conversation people jump to. And in practice, it’s hard to have words just focused on epistemic-evaluation without getting into judgment, or without speech acts being “moves” in a social conflict.
But, I think it’s useful to at least (individually) inhabit the frame of “what is true, here?” without asking questions like “what do those truths imply?”.
With that in mind, some generally useful epistemic advice that I think is relevant here:
Try to have Multiple Hypotheses
It’s useful to have at least two, and preferably three, hypotheses for what’s going on in cases like this. (Or, generally whenever you’re faced with a confusing situation where you’re not sure what’s true). If you only have one hypothesis, you may be tempted to shoehorn evidence into being evidence for/against that hypothesis, and you may be anchored on it.
If you have at least two hypotheses (and, like, “real ones”, that both seem plausible to you), I find it easier to take in new bits of data, and then ask “okay, how would this fit into two different plausible scenarios”? which activates my “actually check” process.
I think three hypotheses is better than two because two can still end up in a “all the evidence ways in on a one-dimensional spectrum”. Three hypotheses a) helps you do ‘triangulation’, and b) helps remind you to actually do the “what frame should I be having here? what are other additional hypotheses that I might not have thought of yet?”
Multiple things can be going on at once
If two people have a conflict, it could be the case that one person is at-fault, or both people are at-fault, or neither (i.e. it was a miscommunication or something).
If one person does an action, it could be true, simultaneously, that:
They are somewhat motivated by [Virtuous Motive A]
They are somewhat motivated by [Suspicious Motive B]
They are motivated by [Random Innocuous Motive C]
I once was arguing with someone, and they said “your body posture tells me you aren’t even trying to listen to me or reason correctly, you’re just trying to do a status monkey smackdown and put me in my place.” And, I was like “what? No, I have good introspective access and I just checked whether I’m trying to make a reasoned argument. I can tell the difference between doing The Social Monkey thing and the “actually figure out the truth” thing.”
What I later realized is that I was, like, 65% motivated by “actually wanna figure out the truth”, and like 25% motivated by “socially punish this person” (which was a slightly different flavor of “socially punish” then, say, when I’m having a really tribally motivated facebook fight, so I didn’t recognize it as easily).
Original Seeing vs Hypothesis Evaluation vs Judgment
OODA Loops include four steps: Observe, Orient, Decide, Act
Often people skip over steps. They think they’ve already observed enough and don’t bother looking for new observations. Or it doesn’t even occur to them to do that explicitly. (I’ve noticed that I often skip to the orient step, where I figure out about “how do I organize my information? what sort of decision am I about to decide on?”, and not actually do the observe step, where I’m purely focused on gaining raw data.
When you’ve already decided on a schema-for-thinking-about-a-problem, you’re more likely to take new info that comes in and put it in a bucket you think you already understand.
Original Seeing is different from “organizing information”.
They are both different from “evaluating which hypothesis is true”
They are both different from “deciding what to do, given Hypothesis A is true”
Which is in turn different from “actually taking actions, given that you’ve decided what to do.”
I have a sort of idealistic dream that someday, a healthy rationalist/EA community could collectively be capable of raising hypotheses, without people anchoring on them, and people share information in a way you robustly trust won’t get automatically leveraged into a conflict/political move. I don’t think we’re close enough to that world to advocate for it in-the-moment, but I do think it’s still good practice for people individually to be spending at least some of their time in node the OODA loop, and tracking which node they’re currently focusing on.
- 8 Sep 2023 16:56 UTC; 59 points) 's comment on Sharing Information About Nonlinear by (EA Forum;
So, on one hand, yes, it totally sounds dumb. But this seems to be missing the point of calling it “AI notkilleveryoneism”, which is to draw attention to the fact that the last few times people tried naming this thing, people shifted to using it in a more generic way that didn’t engage with the primary cruxes of the original namers*.
One of the key proposed mechanisms here is that the word is both specific enough and sounds low-status-enough that you can’t possibly try to redefine it in a vague applause-lighty way that people will end up Safetywashing.
And, sure, there should also be a name that is also, like, prestigious and reasonable sounding and rolls off the tongue. But most of the obvious words are kind a long and a mouthful and are likely to have syllables dropped for convenience (i.e. AI Existential Safety is harder to say than AI Safety). One of the points is to have a name that actively leans into outrageousness of it’s length.
Another part of the point here is to deliberately puncture people’s business-as-usual attitude, via outrageousness/humor.
And, also sure, you can disagree with all of this and think it’s not a useful goal, or think that, as a joke-name, things went overboard and it’s getting used more often than it should. But if you’re actually trying to get the people using the word to stop you need to engage more with the actual motivation.
*FWIW I do think “AI Safety” and “AI Alignment” aren’t sufficiently specific names, and I think you really can’t complain when those names end up getting used to mean things other than existential safety, and this was predictable in advance.
I’m particularly frustrated by the thing where, inevitably, the concept of frame control is going to get weaponized (both by people who are explicitly using it to frame control, and people who are just vaguely ineptly wielding it as a synonym for ‘bad’).
I don’t have a full answer. But I’m reminded of a comment by Johnswentworth that feels like it tackles something relevant. This was originally a review of Power Buys You Distance From the Crime. Hopefully the quote below gets across the idea:
When this post first came out, I said something felt off about it. The same thing still feels off about it, but I no longer endorse my original explanation of what-felt-off. So here’s another attempt.
First, what this post does well. There’s a core model which says something like “people with the power to structure incentives tend get the appearance of what they ask for, which often means bad behavior is hidden”. It’s a useful and insightful model, and the post presents it with lots of examples, producing a well-written and engaging explanation. The things which the post does well more than outweigh the problems below; it’s a great post.
On to the problem. Let’s use the slave labor example, because that’s the first spot where the problem comes up:
No company goes “I’m going to go out and enslave people today” (especially not publicly), but not paying people is sometimes cheaper than paying them, so financial pressure will push towards slavery. Public pressure pushes in the opposite direction, so companies try not to visibly use slave labor. But they can’t control what their subcontractors do, and especially not what their subcontractors’ subcontractors’ subcontractors do, and sometimes this results in workers being unpaid and physically blocked from leaving.
… so far, so good. This is generally solid analysis of an interesting phenomenon.
But then we get to the next sentence:
Who’s at fault for the subcontractor(^3)’s slave labor?
… and this where I want to say NO. My instinct says DO NOT EVER ASK THAT QUESTION, it is a WRONG QUESTION, you will be instantly mindkilled every time you ask “who should be blamed for X?”.
… on reflection, I do not want to endorse this as an all-the-time heuristic, but I do want to endorse it whenever good epistemic discussion is an objective. Asking “who should we blame?” is always engaging in a status fight. Status fights are generally mindkillers, and should be kept strictly separate from modeling and epistemics.
Now, this does not mean that we shouldn’t model status fights. Rather, it means that we should strive to avoid engaging in status fights when modeling them. Concretely: rather than ask “who should we blame?”, ask “what incentives do we create by blaming <actor>?”. This puts the question in an analytical frame, rather than a “we’re having a status fight right now” frame.
The final paragraph there is the most interesting bit, so much so that I’m going to quote it again:
Now, this does not mean that we shouldn’t model status fights. Rather, it means that we should strive to avoid engaging in status fights when modeling them. Concretely: rather than ask “who should we blame?”, ask “what incentives do we create by blaming <actor>?”. This puts the question in an analytical frame, rather than a “we’re having a status fight right now” frame.
The object level has been helpful. But what’s particularly interesting to me is that example of “here is an attempt to come up with a rule that constrains conversation in a way that asymmetrically favors good epistemics.” This is a fairly specific rule that addresses one particular kind of (minor) frame control – the notion that ‘we should blame someone’ is a frame, John’s suggested rule* helps avoid being trapped in that particular frame without giving up the ability to model relevant classes of situations.
[edit: *worth noting that John’s suggested rule also comes embedded in a frame]
But I list this as a pointer to (hopefully) other types of engagement that might asymmetrically help navigate frame conflict in a broader sense.
Strong downvoted mostly to apply some token resistance in the direction away from “Logan gradient descends into maximal fun-ranty-monkey-engagement-incentives.”
I do like the core concept here, but I think for it to work you need to have a pretty well specified problem that people can’t weasel out of. (I expect the default result of this to be “1000 researchers all come up with reasons they think they’ve solved alignment, without really understanding what was supposed to be hard in the first place.”)
You touch upon this in your post but I think it’s kinda the main blocker.
I do think might be a surmountable obstacle though.
Preliminary Verdict (but not “operationalization” of verdict)
tl;dr – @Duncan_Sabien and @Said Achmiz each can write up to two more comments on this post discussing what they think of this verdict, but are otherwise on a temporary ban from the site until they have negotiated with the mod team and settled on either:
credibly commit to changing their behavior in a fairly significant way,
or, accept some kind of tech solution that limits their engagement in some reliable way that doesn’t depend on their continued behavior.
or, be banned from commenting on other people’s posts (but still allowed to make new top level posts and shortforms)
(After the two comments they can continue to PM the LW team, although we’ll have some limit on how much time we’re going to spend negotiating)
Some background:
Said and Duncan are both among the two single-most complained about users since LW2.0 started (probably both in top 5, possibly literally top 2). They also both have many good qualities I’d be sad to see go.
The LessWrong team has spent hundreds of person hours thinking about how to moderate them over the years, and while I think a lot of that was worthwhile (from a perspective of “we learned new useful things about site governance”) there’s a limit to how much it’s worth moderating or mediating conflict re: two particular users.
So, something pretty significant needs to change.
A thing that sticks out in both the case of Said and Duncan is that they a) are both fairly law abiding (i.e. when the mods have asked them for concrete things, they adhere to our rules, and clearly suppor rule-of-law and the general principle of Well Kept Gardens), but b) both have a very strong principled sense of what a “good” LessWrong would look like and are optimizing pretty hard for that within whatever constraints we give them.
I think our default rules are chosen to be something that someone might trip accidentally, if you’re trying to mostly be good stereotypical citizen but occasionally end up having a bad day. Said and Duncan are both trying pretty hard to be good citizen in another country that the LessWrong team is consciously not trying to be. It’s hard to build good rules/guidelines that actually robustly deal with that kind of optimization.
I still don’t really know what to do, but I want to flag that the the goal I’ll be aiming for here is “make it such that Said and Duncan either have actively (credibly) agreed to stop optimizing in a fairly deep way, or, are somehow limited by site tech such that they can’t do the cluster of things they want to do that feels damaging to me.”
If neither of those strategies turn out to be tractable, banning is on the table (even though I think both of them contribute a lot in various ways and I’d be pretty sad to resort to that option). I have some hope tech-based solutions can work
(This is not a claim about which of them is more valuable overall, or better/worse/right-or-wrong-in-this-particular-conflict. There’s enough history with both of them being above-a-threshold-of-worrisome that it seems like the LW team should just actually resolve the deep underlying issues, regardless of who’s more legitimately aggrieved this particular week)
Re: Said:
One of the most common complaints I’ve gotten about LessWrong, from both new users as well as established, generally highly regarded users, is “too many nitpicky comments that feel like they’re missing the point”. I think LessWrong is less fragile than it was in 2018 when I last argued extensively with Said about this, but I think it’s still an important/valid complaint.
Said seems to actively prefer a world where the people who are annoyed by him go away, and thinks it’d be fine if this meant LessWrong had radically fewer posts. I think he’s misunderstanding something about how intellectual progress actually works, and about how valuable his comments actually are. (As I said previously, I tend to think Said’s first couple comments are worthwhile. The thing that feels actually bad is getting into a protracted discussion, on a particular (albeit fuzzy) cluster of topics)
We’ve had extensive conversations with Said about changing his approach here. He seems pretty committed to not changing his approach. So, if he’s sticking around, I think we’d need some kind of tech solution. The outcome I want here is that in practice Said doesn’t bother people who don’t want to be bothered. This could involve solutions somewhat specific-to-Said, or (maybe) be a sitewide rule that works out to stop a broader class of annoying behavior. (I’m skeptical the latter will turn out to work without being net-negative, capturing too many false positives, but seems worth thinking about)
Here are a couple ideas:
Easily-triggered-rate-limiting. I could imagine an admin feature that literally just lets Said comment a few times on a post, but if he gets significantly downvoted, gives him a wordcount-based rate-limit that forces him to wrap up his current points quickly and then call it a day. I expect fine-tuning this to actually work the way I imagine in my head is a fair amount of work but not that much.
Proactive warning. If a post author has downvoted Said comments on their post multiple times, they get some kind of UI alert saying “Yo, FYI, admins have flagged this user as somewhat with a pattern of commenting that a lot of authors have found net-negative. You may want to take that into account when deciding how much to engage”.
There’s some cluster of ideas surrounding how authors are informed/encouraged to use the banning options. It sounds like the entire topic of “authors can ban users” is worth revisiting so my first impulse is to avoid investing in it further until we’ve had some more top-level discussion about the feature.
Why is it worth this effort?
You might ask “Ray, if you think Said is such a problem user, why bother investing this effort instead of just banning him?”. Here are some areas I think Said contributes in a way that seem important:
Various ops/dev work maintaining sites like readthesequences.com, greaterwrong.com, and gwern.com. (edit: as Ben Pace notes, this is pretty significant, and I agree with his note that “Said is the person independent of MIRI (including Vaniver) and Lightcone who contributes the most counterfactual bits to the sequences and LW still being alive in the world”)
Most of his comments are in fact just pretty reasonable and good in a straightforward way.
While I don’t get much value out of protracted conversations about it, I do think there’s something valuable about Said being very resistant to getting swept up in fad ideas. Sometimes the emperor in fact really does have no clothes. Sometimes the emperor has clothes, but you really haven’t spelled out your assumptions very well and are confused about how to operationalize your idea. I do think this is pretty important and would prefer Said to somehow “only do the good version of this”, but seems fine to accept it as a package-deal.
Re: Duncan
I’ve spent years trying to hash out “what exactly is the subtle but deep/huge difference between Duncan’s moderation preferences and the LW teams.” I have found each round of that exchange valuable, but typically it didn’t turn out that whatever-we-thought-was-the-crux was a particularly Big Crux.
I think I care about each of the things Duncan is worried about (i.e. such as things listed in Basics of Rationalist Discourse). But I tend to think the way Duncan goes about trying to enforce such things extremely costly.
Here’s this month/year’s stab at it: Duncan cares particularly about things strawmans/mischaracterizations/outright-lies getting corrected quickly (i.e. within ~24 hours). See Concentration of Force for his writeup on at least one-set-of-reasons this matters). I think there is value in correcting them or telling people to “knock it off” quickly. But,
a) moderation time is limited
b) even in the world where we massively invest in moderation… the thing Duncan cares most about moderating quickly just doesn’t seem like it should necessarily be at the top of the priority queue to me?
I was surprised and updated on You Don’t Exist, Duncan getting as heavily upvoted as it did, so I think it’s plausible that this is all a bigger deal than I currently think it is. (that post goes into one set of reasons that getting mischaracterized hurts). And there are some other reasons this might be important (that have to do with mischaracterizations taking off and becoming the de-facto accepted narrative).
I do expect most of our best authors to agree with Duncan that these things matter, and generally want the site to be moderated more heavily somehow. But I haven’t actually seen anyone but Duncan argue they should be prioritized nearly as heavily as he wants. (i.e. rather than something you just mostly take-in-stride, downvote and then try to ignore, focusing on other things)
I think most high-contributing users agree the site should be moderated more (see the significant upvotes on LW Team is adjusting moderation policy), but don’t necessarily agree on how. It’d be cruxy for me if more high-contributing-users actively supported the sort of moderation regime Duncan-in-particular seems to want.
I don’t know that really captured the main thing here. I feel less resolved on what should change on LessWrong re: Duncan. But I (and other LW site moderators), want to be clear that while strawmanning is bad and you shouldn’t do it, we don’t expect to intervene on most individual cases. I recommend strong downvoting, and leaving one comment stating the thing seems false.
I continue to think it’s fine for Duncan to moderate his own posts however he wants (although as noted previously I think an exception should be made for posts that are actively pushing sitewide moderation norms)
Some goals I’d have are:
people on LessWrong feel safe that they aren’t likely to get into sudden, protracted conflict with Duncan that persists outside his own posts.
the LessWrong team and Duncan are on-the-same-page about LW team not being willing to allocate dozens of hours of attention at a moments notice in the specific ways Duncan wants. I don’t think it’s accurate to say “there’s no lifeguard on duty”, but I think it’s quite accurate to say that the lifeguard on duty isn’t planning to prioritize the things Duncan wants, so, Duncan should basically participate on LessWrong as if there is, in effect “no lifeguard” from his perspective. I’m spending ~40 hours this week processing this situation with a goal of basically not having to do that again.
In the past Duncan took down all his LW posts when LW seemed to be actively hurting him. I’ve asked him about this in the past year, and (I think?) he said he was confident that he wouldn’t. One thing I’d want going forward is a more public comment that, if he’s going to keep posting on LessWrong, he’s not going to do that again. (I don’t mind him taking down 1-2 problem posts that led to really frustrating commenting experiences for him, but if he were likely to take all the posts down that undercuts much of the value of having him here contributing)
FWIW I do think it’s moderately likely that the LW team writes a post taking many concepts from Basics of Rationalist Discourse and integrating it into our overall moderation policy. (It’s maybe doable for Duncan to rewrite the parts that some people object to, and to enable commenting on those posts by everyone. but I think it’s kinda reasonable for people to feel uncomfortable with Duncan setting the framing, and it’s worth the LW team having a dedicated “our frame on what the site norms are” anyway)
In general I think Duncan has written a lot of great posts – many of his posts have been highly ranked in the LessWrong review. I expect him to continue to provide a lot of value to the LessWrong ecosystem one way or another.
I’ll note that while I have talked to Duncan for dozens(?) of hours trying to hash out various deep issues and not met much success, I haven’t really tried negotiating with him specifically about how he relates to LessWrong. I am fairly hopeful we can work something out here.