@gabe_cc
Gabriel Alfour
Glad to read this.
I am currently writing about it. So, if you have questions, remarks or sections that you’ve found particularly interesting and/or worth elaborating upon, I would benefit from you sharing them (whether it is here or in DM).
So if I don’t take myself in general too seriously by holding most of my models lightly and I then have OODA loops where I recursively reflect on whether I’m becoming the person who I want to be and have set out to be in the past, is that not better than having high guards?
I believe it is hard to accept, but that you do get changed as a result of what your spend your time on regardless of your psychological stance.
You may be very detached. Regardless, if you see A then B a thousand times, you’ll expect B when you see A. If you witness a human-like entity feel bad at the mention of a concept a thousand times, it’s going to do something to your social emotions. If you interact with a cognitive entity (an other person, a group, an LLM chatbot, or a dog) for a long time, you’ll naturally develop your own shared language.
--
To be clear, I think it’s good to try to ask questions in different ways and discover just enough of a different frame to be able to 80-20 it and use it with effort, without internalising it.
But Davidad is talking about “people who have >1000h LLM interaction experience.”
--
From my point of view, all the time, people get cognitively pwnd.
People get converted and deconverted, public intellectuals get captured by their audience, newbies try drugs and change their lives after finding its meaning there, academics waste their research on what’s trendy instead of what’s critical, nerds waste their whole careers on what’s elegant instead of what’s useful, adults get syphoned into games (not video games) to which they realise much later they lost thousands of hours, thousands of EAs get tricked into supporting AI companies in the name of safety, citizens get memed both into avoiding political actions and into feeling bad about politics.
--
I think getting pwnd is the default outcome.
From my point of view, it’s not that you must commit a mistake to get pwnd. It’s that if you don’t take any precaution, it naturally happens.
Dialogue: Is there a Natural Abstraction of Good?
It has in fact been a while since the last time I have had written conversations with strangers, I’m sorry that my tone came up as too abrasive for productive conversation.
> Next time I’m working on it, I’ll see if I can consolidate this claim from it and ping you?I have shared my email in DM.
By the way, tone doesn’t come across well in writing. To be fair, even orally, I am often a bit abrasive.
So just to be clear: I’m thankful that you’re engaging with the conversation. Furthermore, I am assuming that you are doing so genuinely, so thanks for that too.
yes. I don’t think any of them suggest that LessWrong is supporting or enthusiastic about OpenAI
I think you may have misread what I wrote.
My statements were that the LessWrong community has supported DeepMind, OpenAI and Anthropic, and that it had friends in all three companies.
I did not state that it was enthusiastic about it, and much less so that it currently is. When I say “has supported”, I literally mean that it has supported them. Eliezer introducing Demis and Thiel, Paul Christiano doing RLHF at OpenAI and helping with ChatGPT, the whole cluster founding Anthropic, all the people safety-washing the companies, etc. I didn’t make a grand statement about its feelings, just a pragmatic one about some of its actions.
Nevertheless a reaction to my statements, you picked up a thread the top answer recommends people work at OpenAI, and where the second topmost answer expresses happiness at capabilities (Paul’s RLHF) work.
How could he have known that Paul’s work would lead to capabilities 2 years before ChatGPT? By using enmity and keeping in mind that an organisation that races to AGI will leverage all of its internal research (including the one labelled “safety”) for capabilities.
I don’t know how you did footnotes in comments, but...
For instance, the context of Ben Pace’s response was one when many people in the community at the time (plausibly himself too!) recommended people work at OpenAI’s safety teams.
He mentions in his comment that he is happy that Paul and Chris get more money at OpenAI than they would have had otherwise, the same reasoning would have applied to other researchers working with them.
From my point of view, this is pretty damning. You picked one post, and the topmost answers featured two examples of support. The type that you would naturally and should clearly avoid with enemies.
To be clear, the LessWrong community has supported many times DeepMind, OpenAI and Anthropic, and at the same time, felt bad feelings about them too. This is quite a normal awkward situation in the absence of clear enmity.
This is not surprising. Enmity would have helped with clarifying this relationship and not committing this mistake.
Also, remember that I do not view enmity as a single-dimensional axis, and this is a major point of my thesis! My recommendation sums to: be more proactive in deeming others enemies, and at the same time, remain cordial, polite and professional with them.
“if you write something that will predictably make people feel worse about [real person or org], you should stick to journalistic standards of citing sources and such”
This is a selective demand for rigour, which induces an extremely strong positivity bias when discussing other people. I would not willingly introduce such a strong bias.
I think other norms make sense, and do not lead to entire communities distorting their vision of the social world. Cordiality, politeness, courtesy and the like.
I think it’s very unlikely that having laxer standards for accusing others is a good thing.
I know you think so. And I disagree, especially on “~0% suffer from having too high standards” (my immediate reaction is that you are obviously rejecting the relevant evidence when you say this).
This is why I am thinking of having an article specifically about this, specifically tailored to Lesswrong.
To varying degrees. People are probably less negative on Anthropic than OpenAI. We’re certainly not enthusiastic about OpenAI.. In any case I don’t think it summarizes to “the Lesswrong community has supported” these orgs.
Have you read the most upvoted responses to your link?
Its conclusion is “I think people who take safety seriously should consider working at OpenAI” (with the link to its job page!)
The conclusion of the second most voted one, from Ben Pace, is “Overall I don’t feel my opinion is very robust, and could easily change.”, and “And of course I’m very happy indeed about a bunch of the safety work they do and support. The org give lots of support and engineers to people like Paul Christiano, Chris Olah, etc”. For reference, Paul Christiano’s “safety work” included RLHF, which was instrumental to ChatGPT.
From my point of view, you are painfully wrong about this, and indeed, Lesswrong should have had much more enmity toward OpenAI, instead of recommending people work there because of safety.
Entities are can both be enemies and allies at the same time, entities can be more or less enemies, etc.
From my point of view: accepting to see someone as an enemy only in extreme cases or in the most complete opposition is part of the mindset that I am denouncing.
As I said earlier, and as I’m willing to explain if needed, this is the canonical losing move in politics.
This seems very fake, idiosyncratic, a much smaller problem (if any) than failing to organise against one’s enemies.
Nevertheless, if you have a write-up about it of −10 pages, I’m interested in checking it out (I often routinely get proven wrong, and I have found it good to extend this amount of interest to ~anyone who engages with me on a topic that I started).
I am genuinely interested in your point of view.
I see it as causally connected to why the Lesswrong community has supported three orgs racing to AGI.
Out of the following, which of them would count as “talking badly about an org” and would a norm of being more thorough before?
“Greenpeace has tied its identity to anti-nuke, and if you’re pro-nuke you’ll be fighting them for as long as they exist”
“If you are for nuke and market solutions, you’ll find Greenpeace has taken consistently terrible stances”
“If you are for nuke and market solutions, every dollar Greenpeace gets is a loss for you”
“Greenpeace is an enemy, but specifically not stupid or evil”
“Strong supporters of Greenpeace will purposefully slow down nuclear energy, technological solutions and market mechanisms”
If the above passes your threshold for “need to be more thorough before saying it”, then that informs what a potential follow-up to my article geared toward Lesswrong would have to be about.
Specifically, it should be about Lesswrong having a bad culture. One that favours norms that make punishing enemies harder, up to the point of not being able to straightforwardly say “if you are pro-nuke, an org that has been anti-nuke for decades is your enemy”. Let alone dealing with AI corporations racing to AGI that have friends in the community.
If the above doesn’t pass your threshold and you think it’s fine, then I don’t think it makes sense for me to write a follow-up article to Lesswrong. It was basically as far as my article goes IIRC, and so the problem lies deeper.
A bunch of wealthy libertarian-leaning Silicon Valley nerds who routinely dismiss the concern that wealthy countries could exploit poor countries
You are projecting.
I have written in the past specifically against tech-libertarianism, in the context of wealth concentration leading to abuses of power.
they’re offended when they’re asked to even address that concern
I’m not offended that I’m asked to address a concern. I merely find it irrelevant.
What offends me is the lack of thought behind assuming that I didn’t know that Greenpeace had arguments. I have seen better on Lesswrong.
While Chess is clearly about adversariality, the central examples I care about (most notably politics and policy disagreements) are in fact about enmity.
It is in great part because enmity is more durable. A Chess game is bounded in time, you have the expectation to close the game soon.
Whereas Greenpeace is a durable enemy. Part of its identity is to be anti-nuke. Most likely, as long as it will exist, it will be anti-nuke.
Similarly, AGI corps will be enemies until they build ASI or get durably prevented from doing so.
I think it’s also partly that in practice, enmity, this type of durable adversariality, does involve hostility. When someone tries to durably undermine you and your endeavours, and when you try to durably undermine them, it’s hard to maintain a cordial relationship.
In “Beyond Enmity”, I gesture at ways to maintain a durable adversarial relationship, with less hostility. But I do explain that it is predicated on the ability to actually engage in enmity.
If a nerd thinks they are in a durable adversarial relationship, they are most likely coping, and they are losing, getting trounced by an opponent who is considering the full enmity of the situation.
This is unreasonably wrong and virulent. It reads as not having read the full article.
Of course I know that Greenpeace has arguments for its stances, and I am familiar with them. I mention a public letter it campaigned for, its budget, and its historical policy positions!
Like, when I was asked by PranavG whether he could link-post, I expected that the LessWrong Community would not like this article. Of course, “how nice nerds often botch enmity” will not fare well in a community that managed to support three companies racing to AGI (DeepMind, OpenAI and Anthropic) while it was worried about the risks of extinction from it.
But… “I mean, do you guys, like, know why Greenpeace is against some of these market solutions?”, “Maybe in more than five minutes you could find other arguments too.”, and “What would happen? Huh?”
Disappointing.
Having spoken with you in the past, I literally do not know if you are making a joke or not.
--
Assuming you are making a joke...
It is a beautiful one!
It completely encapsulates how many in my target audience truly struggle with the enmity, and would immediately react to “Greenpeace” and “enemy” before reading the essay. “Oh, I could find one argument on Google! Ergo, it is all mistake theory!”
I especially like the “do you guys, like” and the “What would happen? Huh?” parts, that make it feel very Reddit.
--
Assuming you are not… (or for the people who missed the joke)
You have missed the point of the essay. I recommend reading it again. Of course Greenpeace has arguments, and I have read quite a few of them.
Enemy does not mean “evil, stupid, and has no arguments that can be found on Google”.
Quite early in the essay, it is written:
This may come as strong language, but bear with me. When I say “Greenpeace is your enemy”, I do not mean “Greenpeace is evil.”
(I, for one, do not think of myself as the highest paragon of virtue, rationality and justice. Certainly not so high that anyone opposing me is automatically stupid or evil.)
What I mean by enmity is more prosaic.
“We and Greenpeace have lasting contradictory interests. Neither side expects reconciliation or a lasting compromise in the short-term. In the meantime, both sides are players of The Game. Thus, they should predictably work explicitly against each other.”
If you take “an organisation has arguments on Google” as strong evidence that they can’t be your enemy, your model of the world is broken, in a way that hope internalising the essay would mend.
How to think about enemies: the example of Greenpeace
How middle powers may prevent the development of artificial superintelligence
From Anthony: Control Inversion
From Vitalik: Galaxy brain resistance
Modeling the geopolitics of AI development
Clearly relevant, thanks.
agreed
agreed, for similar reasons
I strongly disagree with this, and believe this advice is quite harmful.
“Uncompromisingly insisting on the importance of principles like honesty, and learning to detect increasingly subtle deceptions so that you can push back on them” is one of the stereotypical ways to get cognitively pwnd.
“I have stopped finding out increasingly subtle deceptions” is much more evidence of “I can’t notice it anymore and have reached my limits” than “There is no deception anymore.”
An intuition pump may be noticing the same phenomenon coming from a person, a company, or an ideological group. Of course, the moment where you have stopped noticing their increasingly subtle lies after pushing against them is the moment they have pwnd you!
The opposite would be “You push back on a couple of lies, and don’t get any more subtle ones as a result.” That one would be evidenced that your interlocutor grokked a Natural Abstraction of Lying and has stopped resorting to it.
But pushing back on “Increasingly subtle deceptions” up until the point where you don’t see any, is almost a canonical instance of The Most Forbidden Technique.