Joining the chorus of apprecation—it sounds like writing this would have been both intellectually and emotionally exhausting (or, at least, it would have been if I had written it)
My question for you is, of course, which features from Arbital seem relatively easy and worthwhile to port over to LW2.0? I’m still not sure I grok the entirety of the vision and how everything fit together, but it seemed like at least some of the features would be fairly easy to implement without relying on the entire vision to be useful.
I guess also curious, for curiosity’s sake, in “what features wouldn’t work, because they required too much on the tangled web of interconnected pieces?”
One general class of solution are tools that satisfy an author’s goals in an easy fashion, while keeping discussion as visible/transparent as possible.
An idea Ben and I came up with was having an off-topic comment section of a post. Authors get to decide what is “on topic” for a discussion, and there’s an easily accessible button that labels a comment “off topic”. Off topic comments move to a hidden-by-default section at the bottom of the comments. Clicking it ones unveils it and leaves it unveiled for the reader in question (and it has some kind of visual cue to let you know that you’ve entered off-topic world).
(child comments that get tagged as off-topic would be removed from their parent comment if it’s on-topic, but in the off-topic section they’d include a link back to their original parent for context)
A common problem that bothers me with my own comment section is comments that are… okay… but I don’t think they’re worth the attention of most readers. Deleting them (with or without hiding them) feels meaner than the comment deserves. Moving them to an offtopic section feels at least a little mean, but more reasonable.
A related idea is “curated” comments that authors and the mod-team can label, which get a highlighted color and move as high in the comment tree as they can (i.e. to the top of the comments list if they’re a top level comment, or the top of their parent comment’s children)
I’ve only circled twice, not sure how relevant this is. (FWIW, my takeaway so far is “eh, pretty okay, depends a lot on facilitator and situation”). But, some perspective I think is important to the debates going on here.
5 years ago, I was very pro “giving people an excuse to be more vulnerable than they’d normally feel comfortable being.”
I’m still pretty pro this. But… in a less naive way than I was 5 years ago.
It seemed like being vulnerable was basically how you got anything worthwhile. I saw people curled up in their shells, desperately lacking in intimacy in ways that was having a crippling effect on them. They lived in fear of expressing themselves, of taking risks. And having an environment conducive to exploring intimacy and vulnerability was profoundly valuable. (I was at least somewhat this type of person, although I don’t think it was as big a deal for me as for other people I knew)
And the thing that I intellectually understood, but took several years to grok, was that being vulnerable is in fact vulnerable, and you can get hurt doing it.
[my impression is that circling, at least as described by Unreal, is not primarily about vulnerability, but that willingness to try it is a necessary ingredient]
Some situations I’ve run into while doing things-in-a-similar-genre-to-Circling:
No such thing as a “safe space”
A friend of mine facilitated a discussion of friendship/relationships, which he established as a “safe space”. People were encouraged to share anxieties that plagued them. Several people did. At the time, it seemed pretty positive. But, several months later, when Friend A was a combination of sick/exhausted/literally-dying and was frustrated with Friend B who had attended the friendship discussion, Friend A used anxieties that Friend B had opened up about in the circle as a weapon to criticize them, in a public setting. This had permanent harm on Friend B’s ability to trust.
Friend B’s takeaway was that Friend A was a sociopath who gathered anxieties on purpose. I think it’s actually worse than that – I think Friend A is in fact one of the most trustworthy people I knew, was earnestly trying to help people at the time. But “earnestly trying to create a safe, helpful space” is not a good enough indicator to tell you how a person will handle being stressed out and angry several months later. I think there is no such thing as a person you can thoroughly trust enough to create a safe space.
I’ve personally been involved with running “explore vulnerability”-esque spaces where I was encouraging people to open up, and then I found myself realizing too late that I didn’t have the skills to handle the issues that came up as a result of that.
Now, I still think that vulnerability and intimacy and emotional risk taking are basically necessary for most people to achieve their social/emotional needs. (Both for literal intimacy/connection, and for self-awareness as a skill that helps them achieve connection elsewhere – idealized Circling-as-Unreal-Describes-It seems to be more for the latter)
Ideally, everyone would have the opportunity to explore vulnerability carefully, step by step, with a skilled therapist or something to turn to if things ever got dicey. In practice, this is really hard. Not everyone has friends whose combination of skills, needs and connection are the right combination to do optimal-stepping-stone-vulnerability-training. Not everyone has access to a good facilitator or therapistor mediator.
Meanwhile, most of the time, nothing bad happens.
So I think it is net-good for small groups of friends to try this sort of thing on their own, even if they’re not sure what they’re doing. I think there’s something intrinsic to doing risky things together (of any sort) that creates bonds and friendship you can’t get elsewhere. (Unfortunately I can’t explain this very well beyond “if this doesn’t make sense you probably have a Concept Shaped Hole).
I think a lot of the disconnect in this thread (and some similar threads in the past) has to do with some people being noticing how crucially important it is to take the kinds of leaps that involve real emotional risk are, and other people noticing how badly hurt you can get if you aren’t careful.
This post was an experiment in “making it easier to write down conversations.”
One of the primary problems LessWrong is trying to solve (but can’t just be solved via technical solutions) is getting ideas out from inside the heads of people who think seriously about stuff (but tend to communicate only in high-context, high-trust environments), into the public sphere where people can benefit from it.
Obstacles to this include:
Being too busy
Feeling like it’d take too long to polish an entire into something worth posting
Ideas being too sensitive (i.e. based on secrets or pseudo secrets, or wanting to avoid attracting journalists, etc)
Not wanting to deal with commenters who don’t have the context.
“Ideas being too sensitive” may not have a good solution for public, permanent discourse, but “being too busy”, and the related “no time to polish it” both seem like things that could be solved if the less busy participant in a conversation takes the time to write it up. (This only works when the other things the less-busy person could do are less valuable than writing up the post)
A dynamic that Ben, Oli and I have been discussing and hoping to bring about more is separating out the “idea generating” and “writing” roles, since idea generation is a more scarce resource.
A side effect of this is that, since Critch is busy off founding new x-risk mitigation organizations, he probably won’t be able to clarify points that come up in the comments here, but I think it’s still more valuable to have the ideas out there. (Both for the object level value of those ideas, and to help shift norms such that this sort of thing happens more often)
For easy reference, here is a summary of the claims and concepts included in this post. (But, note that I think reading the full post first makes more sense).
The Problem: discussion is happening in private by default
This requires people to network their way into various social circles to stay up to date
Since you can’t rely on everyone sharing mid-level concepts, it’s harder to build them into higher level concepts.
Private venues like facebook...
Have at least some advantages for developing early stage ideas
Are often more enjoyable (so early-stage-ideas tend to end up staying there much longer)
Chilling Effects are concerning, but:
There’s a chilling effect on criticism if authors get to control their discussions
There’s a chilling effect on authors if they don’t – the costs of requiring authors to have fully public discussions on every post are a lot higher than you think
Healthy disagreement / Avoiding Echo Chambers
Is important to maintain
Does not intrinsically require a given post’s comment section to be fully public – people can write response posts
Since the status quo is many people discussing things privately, it’s not even clear that authorial control over comments is net negative for avoiding filter bubbles
You are responsible for writing arguments that are clear and persuasive enough to get upvoted and visible.
People prefer different sorts of trust
You can trust someone to be capable (and willing to perform) a particular skill
You can trust someone to keep their word or uphold certain principles
Transparent Low Trust environments
Rely on clear standards, transparency and safeguards, instead of on anyone particular person being trustworthy
You are responsible for using the transparency to make sure you are safe
Curated High Trust environments
Rely on an owner, who decides what standards to set.
You are responsible for deciding if you trust the owner and the people involved
The person in charge of a space can experiment with stronger or stranger norms that they wouldn’t be able to get the entire rationalsphere on board with.
If you do trust the owner and members of space and agree with the norms, you have to spend less effort worrying about those norms being violated
A single space can’t satisfy everyone’s preferences for trust, but a site with multiple sub-spaces can provide a wider variety of options
Comments about the overton window, or personal criticism…
Can be intensely prolific, making it hard to talk about anything else
Instead of every controversial topic resulting in a massive demon thread every time, you can have a single thread for that, and meanwhile resolve disputes over the overton window via upvotes/downvotes
LW2.0 team has a vision of experimentation, involving:
Flexible Sitewide tools
Individual authors cultivating spaces and reputations
Individual posts having specific conversational goals
There’s something I’ve seen some rationalists try for, which I think Eliezer might be aiming at here, which is to try and be a truly robust agent.
Be the sort of person that Omega (even a version of Omega who’s only 90% accurate) can clearly tell is going to one-box.
Be the sort of agent who cooperates when it is appropriate, defects when it is appropriate, and can realize that cooperating-in-this-particular-instance might look superficially like defecting, but avoid falling into a trap.
Be the sort of agent who, if some AI engineers were whiteboarding out the agent’s decision making, they were see that the agent makes robustly good choices, such that those engineers would choose to implement that agent as software and run it.
Not sure if that’s precisely what’s going on here but I think is at least somewhat related. If your day job is designing agents that could be provably friendly, it suggests the question of “how can I be provably friendly?”
I guess this post is a bit of a typical mind-fallacy check for me, on the “not everyone has read In Defense of Food (or something similar)” front.
Defense of Food has a bit of the naturalistic fallacy going on, but I think it’s core point is at least a hypothesis worth talking about and being able to make distinctions around.
Somewhere in the 20th century, people started getting a cluster of “Western diseases” (i.e obesity, heart disease) that seem to have something to do with diet (although non-diet lifestyle changes are another contender).
In general, the 20th century saw lots of industrialization that radically changed both diet and lifestyle. But in the diet front, there’s a specific with worth noting:
Prior to mid-20th century, we did not have very fine control over what sorts of chemicals went into food. Food was made of chunks of organic matter with a lot of complex reactions going on. Mid-20th century, we started being able to break that down into parts and optimize it.
And this meant that suddenly, food became goodhartable in a way that it hadn’t before. Industry could optimize it for tastiness/addictiveness, with a lot of incentives to do that without regard for health (and, not a lot of clear information on how to optimize it for health even if you wanted to, since health is long-term and tastiness is immediate).
So there is reason to want to be able to distinguish “food constructed the way we’ve been constructing it for thousands of years” and “food we only recently began to be able to construct.”
Now, this hypothesis might be wrong. Lots of people are just applying the naturalistic fallacy (in a way that also outputs preferences for ‘alternative medicine’ and the like). But, if you’re worried about that, it’s probably more helpful to respond one of the first three ways Alicorn suggests, rather than on the level of “obviously everything is chemicals.”
I did realizing in seeing Kaj’s comment and thinking through my reply, that this is fairly complex background framing, and if you don’t have it in mind, it may be hard to notice in realtime that the “everything is chemicals” response might be missing the point, and I’m not (currently) sure if there’s an algorithm I could recommend people run that would easily separate pedantry from potentially-important reframing
I’m flagging individual comments that clearly violate norms with brief mod-notes, which link back to this comment.
Before I dive into them, I wanted to summarize my understanding of Benquo’s position, based on our conversation yesterday.
Ben’s epistemic state is that the Punch Bug article is advocating for a change with a decent chance of escalating into violence with serious consequences.
Most of his concern is not about what Duncan said or intended to say, but in what the likely consequences of it getting promoted are. One important consideration here is downstream effects as the message gets propagated and simplified by the sorts of people who don’t make serious efforts to interpret 14,000 word texts and retain their nuance.
Independent of whether Ben was right in his assessment of the chance of serious escalation, he considered important that, at least in principle, there could be posts on LessWrong advocating for things with dangerous downstream consequences, and we should be cognizant of how our discourse norms will affect our ability to talk about that.
Ben’s model of why and how the post is likely to lead to bad outcomes are informed by the mechanics of how anti-Jew pogroms (and Nazis in particular) have functioned. This is not a claim about the magnitude of how strong the concern is, but it is a claim about the nuances of the dynamics involved. (He specifically noted: it very much would not have made sense to compare it to other forms of escalating violence, genocide, etc, because the dynamics there are different).
Ben experienced a lot of frustration that it is difficult to communicate about this sort of claim without people assuming the claim is about magnitude. In particular, he wanted it recognized that if our discourse norms include “Nazis are special, and either can’t be used as examples or comparisons, at least without a lot of additional interpretive effort on the part of the commenter”, this is creating a world where specific classes of behavioral patterns are harder to talk about.
That all said, by the end of the conversation, Ben and I seemed mostly in agreement on most of the specific areas where I considered his points clearly over the line. (see next section).
There are a few specific bright lines that Benquo crossed. I don’t think avoiding these specific issues would have been sufficient for this discussion to have gone well, but it’s easiest to start with the clearest issues:
Irrelevant, inflammatory points – In Ben’s original comment, while I was generally worried about casually invoking Nazis and pogroms for rhetorical emphasis, the “german car” quip seemed particularly unjustified. Maybe it was an important part of Ben’s thought process that helped him notice something subtle, but as an argument it was clearly irrelevant to the broader point – I doubt Ben’s concerns depend on whether kids had invented Punch Bug or Punch Ford.
Attributing words to Duncan that he didn’t say – It’s not okay to substitute different words with different connotations and then ascribe them to someone as if they’d made a different claim. It’s doubly not okay when the connotations are as intensely bad as “ghetto.” LessWrong does need to be the sort of place where, if you think someone is advocating for something with bad consequences, you are able to describe those consequences. But it is not okay to do that without clearly indicating that this is an inference you are making.I think this issue was exacerbated by an issue Duncan noted, of Ben pointing towards norms that are indeed good and valuable, without acknowledging that Ben had already violated them.
Ascribing “Violent Sadist” as a motivation – This comment had a number of things going wrong with it. First, it was worded in an inflammatory way that made the discussion more adversarial. Ben reworded the comment in an attempt to alleviate that. I don’t think he went far enough in the rewording. But mostly I think the comment has a deeper, underlying issue – a failure of some combination of charity, imagination, empathy, or curiosity.Said recently made some comments about penalizing “inner workings” rather than outward actions. I think there is something important to that – at the very least I think LW users (and mods) have a responsibility to be very careful with that sort of reasoning. In this case, I can lay down some concrete rules/policies – if you find yourself making guesses about the motivations of someone you’re debating, and your explanation is “violent sadism”, you have a responsibility to at least come up with one alternate explanation. A serious explanation, not “house elves are stealing our magic”. (You should probably generally do this whenever someone advocates for something you find baffling)But this rule doesn’t really feel sufficient to me. It accounts for this particular failure mode, but doesn’t quite carve reality at the joints and prevent a broad class of related issues. And here we get into a bit less “bright line” territory. I have guesses about what inner-generator Ben was running here. In our in-person conversation we chatted a bit about it. While wearing my official moderator hat, what I feel comfortable saying is:There will be times when we see a user making a string of bad comments. We need to be able to both say “this is bad for this specific reasons”, and also “something about your inner generator that is producing these comments is reliably degrading LessWrong discourse, and it is your responsibility to figure out what, and do something differently.” It cannot always be the full responsibility of the LessWrong team to tell all commenters precisely what needs to change.
Importantly, I think that even if Ben had avoided the error modes listed above, there’s a good chance the conversation still might have deteriorated (perhaps creating an even harder moderating challenge, since instead of clear over-the-line comments to critique, there would have been a bunch of juuuuust under-the-line comments requiring difficult judgment calls).
I don’t have anything specific to say here, other than “in a thread like this (where the primary discussion is about shifting social consensus), I think it’s important for everyone to spend more than what-feels-like-their-fair-share of charity and interpretive effort, because otherwise things tend to slowly escalate.”
Towards Clearer Policies
We had considered straight-up deleting some of Benquo’s comments a week ago. We didn’t end up doing so. I don’t think we had principled reasoning one way or the other, but my own post-facto introspection is that, at least for myself, it felt like most ways we could intervene had a strong chance of making things worse. It felt important to talk to Ben in person, and in general take the time to think and gather information before taking much explicit action.
There are certain moderation tools that would have made this easier to handle – I think we probably should have locked the entire thread after our initial warning and Ben’s subsequent reply, until we’d had a chance to talk in person.
I am embarrassed to note that I explicitly considered and discarded the idea that we could manually lock replies on each individual comment (since we do have the tool to do that), which would have taken maybe 10 minutes of effort, and my brain just straight up slided off that because it felt silly to spend 10 minutes doing a thing that I could envision how to automate.
There were a few other actions we could have taken with minimal effort that would have made this experience much less stressful for everyone involved without making major tradeoffs, and all I have to say is that none of us thought of them. In some cases I notice specific biases that led to us not thinking of it, in other cases it was just a matter of “this is a high dimensional problem space that we were not very experienced in – we had not yet formed good models of it to make thinking about it easier.”
Over the past week we’ve collectively spent a couple dozen hours thinking about both this specific discussion, and moderation in general. I don’t think our next major moderation crisis will be anywhere near perfect, but I do expect us to be better at noticing quick, low-cost actions we can take to communicate better and keep the situation from escalating.
I also expect that at least for situation similar to those we’ve faced before, we’ll be able to act faster.
With all that in mind, specific mod notices for Ben.
For blatant violations of the type described above, we’ll be aiming to quickly flag comments that are crossing the line. Depending on circumstances, we may delete comments that cross the line, with an explanation of why (note: our delete button sends the user a PM containing the text of the comment, so the content isn’t lost)
In the future, in discussions where you notice yourself tempted to make comments similar the ones we’re flagging here, I ask that you put a lot more effort into ensuring the comments are high quality. This is important both for avoiding the explicit bright lines as well as grey-area-just-under-the-line comments. Yes, this is a cost that makes certain kinds of conversations harder. And we should be cognizant of that. But my current take is that this is just a necessary cost we have to pay.
I’m fairly confident, based on extended conversations I’ve had with Ben over the years as well as the one last night, that the situation won’t escalate past the point of periodic warnings. I disagree with Ben on some important object level (and meta level) things, but I trust Ben’s underlying goals, principles and error-correction algorithms to be aligned with LessWrong’s overall mission.But because we do need legibly clear rules, it’s important to note that if this sort of discussion became a pattern, we would follow a policy of “warnings of increasing severity and publicity, followed by escalating temp bans.”
Google is deliberately taking over the internet (and by extension, the world) for the express purpose of making sure the Singularity happens under their control and is friendly. 75%
I believe that the disagreement is mostly about what happens before we build powerful AGI. I think that weaker AI systems will already have radically transformed the world, while I believe fast takeoff proponents think there are factors that makes weak AI systems radically less useful. This is strategically relevant because I’m imagining AGI strategies playing out in a world where everything is already going crazy, while other people are imagining AGI strategies playing out in a world that looks kind of like 2018 except that someone is about to get a decisive strategic advantage.
Huh, I think this is most useful framing of the Slow/Fast takeoff paradigm I’ve found. Normally when I hear about the slow takeoff strategy I think it comes with a vague sense of “and therefore, it’s not worth worrying out because our usual institutions will be able to handle it.”
I’m not sure if I’m convinced about the continuous takeoff, but it at least seems plausible. It still seems to me that recursive self improvement should be qualitatively different—not necessarily resulting in a “discontinuitous jump” but changing the shape of the curve.
The curve below is a bit sharper than I meant it to come out, I had some trouble getting photoshop to cooperate.
More thoughts to come. I agree with Qiaochu that this is a good “actually think for N minutes” kinda post.
Something about the Stag frame does make things click. In particular, it gets away from “cooperate/defect” and forces you to think about how much things are worth, which is a cleaner look at the actual game theory problem without confusing it with moralizing, and primes me to evaluate the sort of situations you’d run into here more appropriately.
All in all it sounds like, while I confess a little disappointment that you hadn’t thought through some of the things that appear most-obvious-in-retrospect, the experience sounds like a pretty reasonable first pass at the experiment.
I think the claims here are fine but the title is a bit clickbaity/metacontrian in a way I’m not a huge fan of.
I will note that the metaphor I found least intuitive was calling the network “Omega” – I’d personally have named the post “The Improv Scene Model of Social Reality” or something, rather than focusing on that. But YMMV.
I do think having a unique name for this post to make it easier to refer to (without concept clashing with similarly-named posts) would be handy.