Navigating an ecosystem that might or might not be bad for the world

  • ===============

    Context for this dialogue: Lightcone has been exploring dialogues as a new content type on LessWrong. This dialogue was born out of me and kave trying out this new format. You should read the below more as a chat log between friends with an epistemic status that is something like “I am talking to a friend, assuming a bunch of context, not being very careful with my words, and blurting things out more than I would usually do in public writing”. For context on my first message, the Lightcone Offices closing post is OK, though my thinking has still changed a lot since then.

    ===============

    I have this deep sense that somehow this ecosystem will do a lot of stuff that makes the world a lot worse, and I don’t know how to relate to it in a way that doesn’t make my actions dominated by my effect on that, and I expect that I will either contribute to making the world worse, or be consumed in some kind of political conflict that will make my life terrible.

  • I don’t quite understand the “doesn’t make my actions dominated by my effect on that” part of your sentence. Could you expand?

    I think that it seems quite hard to do robustly good stuff in AI land, and it makes sense to be scared that we will make the world a lot worse. I do have a fear that you will fail to drive forward anything positive because you will be flinching between trying to stop potential ecosystem-induced disasters, but not getting a lot of traction on any of them

  • I don’t quite understand the “doesn’t make my actions dominated by my effect on that” part of your sentence. Could you expand?

    Like, I kind of wish I had a third option that is something like “ignore the ecosystem and just go and try to affect the world directly”, but I don’t feel like that’s an option. Too much of the infrastructure and things I am building seem downstream of the ecosystem, and I don’t really know how to disentangle myself.

    And within the ecosystem I have this constant fear that I am being diagonalized over, and acting within it feels like this much more social problem, instead of something that I can think about systematically.

  • I think that it seems quite hard to do robustly good stuff in AI land, and it makes sense to be scared that we will make the world a lot worse. I do have a fear that you will fail to drive forward anything positive because you will be flinching between trying to stop potential ecosystem-induced disasters, but not getting a lot of traction on any of them

    I also am quite worried I will not have much traction on dealing with the ecosystem-induced disasters, but like, if I think the disasters are enough to make things net-negative, then I don’t really see what alternative choice I have.

    I feel a bit like I am actually working for a company whose CEO is pretty gullible or manipulatable, and this gives me a sense of powerlessness over the outcomes of my actions. Like, if I build LessWrong, and Eliezer goes and endorses some crazy person, or the AI Alignment community gets eaten by the ML community or the AI capability companies, then like, yeah, all my impact is net-negative.

    And I have some glimmer of hope that LessWrong can be better than this, and has a more robust independent epistemic process, but I am worried I will just be summoning a different kind of demon.

  • Like, I kind of wish I had a third option that is something like “ignore the ecosystem and just go and try to affect the world directly”, but I don’t feel like that’s an option.

    Yeah, that makes sense. I guess I want to defend the “ignore and build” approach nonetheless. Two reasons come to mind immediately.

    1. I think the ecosystem is made out of what the people in it do. You can choose to do things that change the character in better ways (perhaps). I guess another way of saying this is that I have more hope for you leading the ecosystem by showcasing something constructive.

    2. I think that first-order effects are just pretty large, man. I think that if your leverage on the ecosystem seems weak, just try and do something good. I think this is complicated if you think your main effects will be to undiscriminatingly empower people within the ecosystem. I think this is probably false, and would be happy to argue about it.

  • I also am quite worried I will not have much traction on dealing with the ecosystem-induced disasters, but like, if I think the disasters are enough to make things net-negative, then I don’t really see what alternative choice I have.

    This seems confusing to me. Is it a deontological thing or a consequentialist thing? Like, this doesn’t translate straightforwardly into an EV argument

  • And I have some glimmer of hope that LessWrong can be better than this, and has a more robust independent epistemic process, but I am worried I will just be summoning a different kind of demon.

    I guess there’s a pretty interesting thread about how much one should update on (a) how much trying to build for people is demon-summoning and (b) how scared to be of that kind of demon-summoning.

    I think that the history of LessWrong is more relevant to whether you’ll be demon-summoning than the history of ecosystem projects, though it would be nice to get whatever info we can from each.

  • I mean, I agree that I can do things that change the character in better ways, but that still sounds like my effects will be primarily through the ecosystem. And a lot of what I need to do in order to stay alive within it, is to give it resources, and it’s not clear that my steering here will have much of an effect (and I do expect that I will have a large effect on the firepower of the movement).

    Let’s talk about whether my main effects are through indiscriminately empowering people within the ecosystem. Like, I do think that the primary thing I am trying to do is to build an ecosystem of people who have the right ingredients for intellectual progress on important things I care about. I could pivot towards just trying to build my own models and write them up and communicate them, but that sure sounds like a different plan than what I have been pursuing so far, and it’s not clear to me Lightcone is that well set up to help with that kind of goal.

    Like, I don’t know what it means for me to “just do something good”. I build infrastructure. I make features and platforms that get used by millions of people. The core value proposition requires putting my trust in others.

  • I feel a bit like I am actually working for a company whose CEO is pretty gullible or manipulatable, and this gives me a sense of powerlessness over the outcomes of my actions.

    I think you and I have some crux about how much you are creating discrete entities that are coöptable. I feel like if you build LessWrong and prominent people who use it do things you dislike, you are just in a pretty good place to say that and talk about it and try and help people who aren’t doing it.

    I do notice I’m somewhat confused about this. For sure it feels like LessWrong won’t be under your control, and it at least seems possible that its focus will move in a way that seems bad to you. So that does seem a fair bit like summoning a demon you can’t put down.

  • This seems confusing to me. Is it a deontological thing or a consequentialist thing? Like, this doesn’t translate straightforwardly into an EV argument

    I meant it as a straightforward EV argument. Like, if my impact is dominated by my effects on the egregore, and the egregore is bad, then seems like the EV of my actions is bad, and I should just go home or something.

  • I meant it as a straightforward EV argument. Like, if my impact is dominated by my effects on the egregore, and the egregore is bad, then seems like the EV of my actions is bad, and I should just go home or something.

    Sure. I was considering the argument to be like: “egregore is negative EV → I should work on fixing the egregore”. Whereas you seem to be saying something more like “my efforts mostly strength the egregore → egregore is negative EV → I mostly shouldn’t effort”.

  • I do notice I’m somewhat confused about this. For sure it feels like LessWrong won’t be under your control, and it at least seems possible that its focus will move in a way that seems bad to you. So that does seem a fair bit like summoning a demon you can’t put down.

    A related point here is that I think in order for LessWrong to not be co-optable, it probably needs to either learn the skill of assessing and integrating character-evidence into its assessment of people that it gives power and status to, or it needs to be constructed such that there fundamentally isn’t much to co-opt. I think this isn’t true about current LW, or like, it’s only true conditional on LW failing to achieve most of its potential.

  • Sure. I was considering the argument to be like: “egregore is negative EV → I should work on fixing the egregore”. Whereas you seem to be saying something more like “my efforts mostly strength the egregore → egregore is negative EV → I mostly shouldn’t effort”.

    Yep, the second causal chain is more how I am modeling things.

  • Let’s talk about whether my main effects are through indiscriminately empowering people within the ecosystem. Like, I do think that the primary thing I am trying to do is to build an ecosystem of people who have the right ingredients for intellectual progress on important things I care about. I could pivot towards just trying to build my own models and write them up and communicate them, but that sure sounds like a different plan than what I have been pursuing so far, and it’s not clear to me Lightcone is that well set up to help with that kind of goal.

    Like, I don’t know what it means for me to “just do something good”. I build infrastructure. I make features and platforms that get used by millions of people. The core value proposition requires putting my trust in others.

    I don’t think that “trying [...] to build an ecosystem of people who have the right ingredients for intellectual progress on important things I care about” is “indiscriminately empowering people within the ecosystem”.

    Here are some ways that you have influence on such a system:

    • Your intellectual leadership. For example:

      • your entry at the top of the Names, Faces & Conversations for the SSS retreat

      • your comments on LessWrong

    • Pointed feature development trying to make certain kinds of discussion, attention patterns, etc happen more or less on LessWrong

    • Moderation (especially curation feels like a very legitimised avenue for you to exert your taste on the site)

    • Writer/​reader/​reviewer recruitment

  • I agree that there are some avenues of influence here, but I also feel like those don’t really help with not having the site become primarily a talent funnel to AI capability companies, or helps us notice FTX earlier, or helps prevent the kinds of things I am worried about CAIS doing.

    I also agree that there are some quite robust things that seem more robustly good, like just having better and clearer explanations of the AI X-risk problem, and having more relatively straightforwardly verifiable facts available about AI Alignment contributions. Those things seem possible to co-opt, but not that likely, and I do feel a sense of more hope when I imagine a world with that kind of thing than without.

    I think a lot of what I am worried about is that in order to get funding and to survive in the talent competition, I need to do more than that, but maybe I am just wrong here, and LessWrong can probably survive without making many additional compromises here.

  • those don’t really help with not having the site become primarily a talent funnel to AI capability companies, or helps us notice FTX earlier, or helps prevent the kinds of things I am worried about CAIS doing

    My hot take is that you should worry about the talent funnel and that you shouldn’t rule out a plan because it fails to address the other two things.

  • Ok, but what do about the talent funnel? I do think that among the default outcomes is that the field of AI Alignment ends up being centered around the efforts of the AI capability companies, and they will want to make it so that nobody respects the site, or that it’s low-status for people to participate, and maybe we just roll with that, but I do expect it to hurt.

    I do feel a lot better here if the forum and the place is more clearly oriented around LessWrong.com and “Preventing AI existential risk”, which do seem a lot harder to co-opt.

  • I agree that is among the default outcomes. I also expect it to hurt and it also seems like it could successfully make it harder for LessWrong to influence the future.

    I think I currently model LessWrong as a place that is trying to have a certain kind of conversation with itself, and to valorise a certain kind of thinking and writing. Because those “certain kinds” of things are actually virtuous, people sometimes pay attention to us and sometimes that is quite valuable and important (e.g. LessWrong’s impact on how people think about AI alignment, perhaps our impact on COVID policy). To the extent this model is true, it could be bad if no one wants to pay attention to LessWrong any more.

    I also have some hope that “antisymmetric discourse will win out”. That is, good faith argument and truth-seeking are not actually that marginalisable. I’m not sure how to think about that quantitatively or what my cruxes are there. I think if I knew more about the history of scenes being destroyed from without, that would help me think about that.

  • Ok, I think my crux here is maybe something like “but my sense is I have done a lot of good by prosecuting what seems to me like bad behavior, and maybe that is actually where a lot of my impact comes from”. And I don’t feel quite ready to give up on that, or to draw my circle of concern so narrowly that I only judge the contributions people make on LessWrong, or on verifiable topics.

    Like, I do find a lot of this kind of work a frustrating mixture of highly socially rewarding, that allows a bunch of people to notice a bunch of messy stuff going on, and helps them orient, and drives positive change, and like I am not achieving anything, and I am just continuously paying costs while alienating people.

  • Like, idk, what am I supposed to do when promising young researchers approach me, and say things that clearly indicate they are too trusting and aren’t modeling a lot of the incentives that are shaping the behavior of the people around them. Should I just ignore them? Should I point them towards things I have written, but not really try very much to help them individually?

  • It seems like there are a couple of threads here.

    1. Is it reasonable to stop prosecuting bad behaviour (to focus on other things)?

    2. More generally, how should I think about propagating ~cynical social context (of which prosecuting bad behaviour is kind of a subspecies)?

    3. Something something focus on LessWrong. I can’t quite disentangle the parts that are like (a) only focusing on verifiable topics, (b) ignoring people’s contributions and actions off-site when interfacing with them on-site, (c) the part about generally choosing LessWrong as your main focus and letting other valuable avenues lie.

    On (1), I don’t feel like I have a lot of context to engage with this. On (2), it does seem like you shouldn’t give up on arguing with people, including in person. On (3), I’d like to hear more about how you respond to my summary of that topic.

  • I feel like we’re a little caught in the weeds, and it’s not obvious that this is where we want to focus our attention. I wonder if you’d like to say what seems most alive to you in this vicinity. If it’s a response to what I wrote or a return to an earlier part of the conversation, that seems fine, as well as a new frame or topic.

  • Ok, I think the framing that maybe currently resonates most with me is something like “But I do actually need a community to work with me on LessWrong. I do a lot of my thinking socially and in connection with others. I need to hire people, and I want people to hang out with that I feel excited about working with, and I want to be on the same page about what we are trying to do when I am having an online conversation with them. I currently don’t know how to get that, without being paranoid about lots of people in that community, and without feeling like I constantly have to fight to maintain my membership or my standing in that community”

  • Is your concern because of the dynamics of communities in general? Or is it because (a) you expect to mostly draw from the existing community-around-here and (b) that community has dynamics that require the paranoia and fighting?

  • And I think a major component here is that I feel like I will “get got”, in the sense that I will be recruited to join political battles against people who have done nothing wrong. And maybe there is a way to relate to the rest of the world in a way that is more Nash and has a more “look, these are the bounds of the kind of conflict I am engaging in”, feeling to it, but somehow that feels like giving up and locking myself in a box and breaks my “heroic responsibility module” or something like that.

  • Is your concern because of the dynamics of communities in general?

    My sense is it might be possible to build a community that doesn’t have these dynamics, though I wouldn’t know how to build one. I also separately think that de-facto I’ll just have to build something very entangled with the community I do have, because the memetic space around making AI not kill everyone is just not that big, and I’ll have to negotiate over that with people somehow.

  • I’m having a bit of a hard time simming the “getting got” in any degree of detail. Is it something like the following?

    There are some people who will do things like criticise the labs. Their criticisms will be obnoxious in some ways, perhaps because they’re trying to express a bunch of moral fire. They will also not be completely rigorous or defensible, will in fact overstep in some places. Then a bunch of people who are powerful within the community will try and make those people uncool and sidelined. They will want them to have some kind of penalisation against them on LessWrong or at least will want spaces they can use where they don’t have to interface with them, or maybe they just want everyone to agree they’re uncool. You’ll be under a lot of pressure to ship code or join your voice with the powerful ones or just decide in your heart that the detractor is uncool

  • No, that doesn’t capture what it feels like on the inside. The thing that I want to point to is more that I will feel like some people are doing highly immoral things, but actually they are just being kind of reasonable. Maybe that will be the result of some vaguely defined external pressure, but the problem is that I won’t be aware that it will be the result of pressure.

    And I am less worried about this happening with people who criticize labs, since I feel like I feel quite solid in my model of what is good and bad there. It’s more likely to happen around organizations like FTX, or organizations like Conjecture, where I hadn’t thought that much about what FTX was doing, and from my perspective there wasn’t anything clearly bad that FTX had done besides doing some kind of fishy things and not seeming to care about the same things that I care about.

  • I still don’t feel able to picture what you’re worried about. Here’s another attempt.

    α is the AI Safety org du jour. It has got a lot of power, and a promising new impact model. It seems kind of good and everyone keeps saying it’s good and it’s definitely doing stuff and maybe this is finally the moment of traction.

    β is an organisation that is trying to do something at cross-purposes with α. They’re not generally very critical of α as an organisation, though sometimes they’ve written about disagreements with specific projects that α have done. But they also pursue some projects that are actively unhelpful for α and don’t seem very responsive to concerns that they should back off on their own projects in order to get out of α′s way.

    You end up mad at β and on the margin try and direct resources away from them (e.g. encouraging people not to work for them when they ask for advice)

  • Yep, something like that seems closer, and like a potential way things could play out.

    And like, there is a background assumption here that I will frequently be doing some kind of norm enforcement or investigation into people who do shady things. Like, I feel like a lot would have to change about how I relate to things in order for my answer to people accusing β of some thing to be anything but “oh, that seems bad, let me look into that, and propagate negative information if I learn about it”.

    Like, I can imagine taking on a role of “I am happy to comment on β′s contributions to LessWrong and the AIAF, and give my honest takes on whether they help with AI Alignment, or are locally valid, but I won’t get involved with some kind of broader norm enforcement”, but that feels very sad to me, and something that would be a major departure of what kind of life I’ve been living.

  • It does seem like if your plan is to be something akin to a vigilante or mercenary that is responsive to specific exhortations to pursue potential targets, you had better be pretty careful in which exhortations you respond to (and how you respond to them). I note that, as well as the adverse selection aspect, you seem like you are surely DoSable unless you only look into things stochastically.

    I feel confused here. I think it would be nice if there were something robustly good to do in the face of accusations. I am not sure how you would characterise your current process, and whether it is particularly exploitable.

    One obvious thing to try if you are being fed filtered or misleading evidence is to be less confident in your conclusions and correspondingly less forceful in your actions. I am guessing this is dissatisfying to you, but would like to hear about why if so.

  • One obvious thing to try if you are being fed filtered or misleading evidence is to be less confident in your conclusions and correspondingly less forceful in your actions. I am guessing this is dissatisfying to you, but would like to hear about why if so.

    I think accurately conveying confidence seems good. I think being less forceful is very exploitable, because there aren’t currently any collective action mechanisms by which a bunch of people being low-to-medium forceful translates into anything actually substantial happening. In order for anything to happen, you need one person to really push for things. Medium-strength actions mostly just get ignored.

  • I note that, as well as the adverse selection aspect, you seem like you are surely DoSable unless you only look into things stochastically.

    Seems like I would only be DoSable if I didn’t have a severity threshold. Like, it seems fine to be like “I will only investigate stuff if the 5-minute summary passes this quite substantial threshold of being worth my time to prosecute”.

  • Yes, it could be that the correct call is “60% that this entity is a bad actor. I push for OSTRACISM and EXILE”. Does seem like your actions will generally have worse outcomes when your information environment is being polluted. Maybe that is the more important part to focus on.

    It seems like three paths (not intended to be exhaustive) one could take are (1) accept that your effects are going to be blunted in expected value and perhaps jacked up in variance, and just roll with it, (2) also accept that, but back out because the expected value is either too low, or out of some sort of maximin intuition, or (3) try and come up with routes around it.

    One frame is that you’re trying to figure out when /​ how to do (1) vs (2) vs (3) in the particular context of Oli the Crusader.

  • I think if it was just an issue of expected value/​variance I would feel quite different. My feeling is more that the incentive here is to go directly for my epistemology, or somehow for myself to mess up my epistemology, and the expected value of that feels very hard to estimate.

    Like, it’s not like I will just be miscalibrated about my opinions on the specific person. It’s more like I will think that different alignment agendas are correct, and my p(doom) will change, and my models about whether prosaic work makes sense will change depending on the social context.

  • I agree that having your epistemology messed with seems pretty bad. I feel more confused about what we’re discussing or something. (Though the latter paragraph helps a little)

  • Sure, let me restate where we are at.

    I currently feel like I want a place to think in. As part of having that place, I expect to take on a role of prosecuting bad behavior and enforcing norms. However, that norm-enforcement context feels very mindkilly and like it will make it hard for me to orient in a lot of different ways. But also not doing it makes me feel very vulnerable and like I won’t be able to have an environment that won’t have people fucking me over in really bad ways.

    And there is an alternative option here that feels like what a lot of the rest of the world has done, which is to agree on much narrower interfaces, or to prosecute norm violations in only much more narrow ways.

  • I want more colour on the mind-killing of norm-enforcement. I am like “well maybe you can just not have that” but it’s all kind of vague. I agree not fighting for norms seems pretty bad.

    I also kind of like the narrower interfaces. I don’t even really understand what you’re planning to do instead. For example: if someone writes a thoughtful post, the post is thoughtful regardless of what they’ve done elsewhere. My guess is something like trying to make sure that announcements or descriptions of their projects receive a lot of pushback

  • My current guess at the issue here is something like “most positive incentives you can create for doing good work cash out in social capital, which requires being in good standing with the community to cash in. As such, any dispute over whether someone made a positive or negative contribution requires an implicit discussion of whether they should remain in good standing afterwards”.

    Like, let’s take the example of Alex Flint. Seems like he maybe made some good object-level contributions. Seems like he basically can’t cash in any of the benefits of those contributions given his current social standing.

    An option here is to just like, pay people for good contributions, and then they can do whatever they want with that, but that does feel pretty expensive, and there are a lot of benefits to having a social hierarchy oriented around merit.

  • Yeah, I think trying to have a community that trades value only in its internal ledger is fucked. You’ve got a dial of how much cashing-out-in-cash to cashing-out-in-not-literally-cashing-out you can choose for your community

    (There are also a bunch of things other than ingroup prestige you can give people: making it so that ingroup prestige also produces general prestige, like academic pubs or fame. Making it so that you have inkind resources (which can then increase savings even if they’re temporary like housing). Giving them opportunities to network with powerful or interesting people outside of the ingroup)

  • I mean, not super sure how much of a dial I have. I generally don’t control the flow of funds. So all I usually have is the social capital, and then other people control the in-flow and out-flow of cash.

  • You also don’t control the social capital. Maybe you are trying to imagine you are? Of course you do have a purse of it, moreso than of funds

  • Yeah, seems fair. Like, in order to have some social capital to spread around, I need to earn it first (either for myself or my platform). Same for cash. Agree that I don’t control either of these really.

  • I think my complaint is more against the “social capital” frame, maybe. Or like, when I say “you also don’t control the social capital” a large part of what I’m saying is that prestige does not behave very much like money. It is prone to redistribution and revaluation. It’s in some ways like a thing that’s continually renewed rather than a store of value.

  • Agree! That’s kind of my whole point about finding it hard to judge people’s contributions in the context of narrow interfaces. Someone will do something bad off-site. And then I feel like the social capital balance can’t update unless everyone acknowledges they did something bad. So if I don’t evaluate it, or create social consensus around it, or something, then there is a lot of mounting pressure on me to have a take, and if I don’t, I do kind of break social enforcement mechanisms.

  • I guess I want to say (and this is more the cry-of-my-heart than my considered view): yeah, you have this rubbish pile of resources that people might direct around and that this has some bad kingmaker dynamics and stuff. But if you focus on LessWrong, somehow that won’t come to eat LessWrong or something? Or LessWrong can be made indigestible. I’m not sure how to think about that with [example we discussed in person]. I guess there I’m like “well LessWrong doesn’t have anything to give him other than a platform, and that feels like a much easier call to decouple and figure out about”.

  • Another example here might be LessWrong moderation. Like, I do feel like I have historically been much more hesitant to ban people because they made good contributions off-site. And that seems reasonable to me. But it does indicate that there is a net-value estimation going on in LessWrong membership and moderation decisions, that does require me to engage with the broader question. And sometimes people will get really angry at me for not kicking someone out, and then I have to defend that decision, and that will require a whole complicated analysis of everything they have done.

    And agree that it still seems easier to decouple and figure out about. And maybe we should just have some policies that commit us to full neutrality about some things, or at least create relatively simple abstractions for when things outside of the site feed back into our decisions.

  • It does feel complicated in a site about rationality. If someone has achieved valuable things or impressive things outside of the site, maybe they are more likely to be right in their kvetching about an apparent website-wide blindspot. Idk, maybe it’s not that complicated.

  • Ok, let me get back in touch with what I am trying to solve here. Like, idk, I just de-facto expect the kind of things we are trying to think about on LessWrong to not have good enough grounding that I can meaningfully create a status hierarchy or any kind of incentive system that doesn’t rely in substantial parts on deferring to “the standard people”(™). And then like, I really need to model what they will say, and what will make people think something is endorsed by them.

    And when I imagine making things feel more grounded, because I don’t want to deal with the shit, I just end up creating a lot of mechanistic interpretability and handing things over to the ML-crowd who will easily goodhart the shit out of any measurable metrics I set out.

    I feel like I have some hope here where I myself become a source of judgement and taste on the site, and that can provide a sense of grounding, but I am pretty far away from that, though maybe not as far as I think it is. It does seem quite doable to comment and give my overall take on many of the most popular posts on the site. And if I do it persistently, maybe I won’t just get downvoted whenever I say something that disagrees with “the standard people”(™).

    It does seem like this would take a lot of time.

  • It does seem quite doable to comment and give my overall take on many of the most popular posts on the site. And if I do it persistently, maybe I won’t just get downvoted whenever I say something that disagrees with “the standard people”(™).

    also I feel like getting downvoted doesn’t mean not getting traction, anyway

  • It seems worth trying. It does seem that if we find success we might end up having resources and then get sociopathed 🤷

  • Man, but I am only one man and I live at home. I don’t know how to evaluate most of this stuff and how to make my thinking legible to the internet. I don’t even know how to write top-level posts. And I don’t know how to build an anchor for a healthy epistemic culture.

    Does seem maybe something I can learn.