Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Running Lightcone Infrastructure, which runs LessWrong and Lighthaven.space. You can reach me at habryka@lesswrong.com.
(I have signed no contracts or agreements whose existence I cannot mention, which I am mentioning here as a canary)
Oh, I see. That makes sense, I agree I misunderstood this part to be about something else (though I disagree similarly strongly with the correct interpretation, but it’s still good to clear that up).
Specifically, a screenshot of a moderator warning him that advocating violence is grounds for a ban there. It would also be grounds for a ban on LW.
No, this isn’t true, and I am ultimately head-moderator. I think many people will encounter thoughts and ideas around whether violence is appropriate when they encounter the existential stakes of AI. Discussing whether those ideas are right or wrong is very much a thing I want LessWrong to be able to do.
I think they are almost universally wrong, but people arriving at that conclusion will do so more likely with argument, (and who knows, we do not live in a world where we can truly always rely on never needing to take up arms in some form or another and there certainly are edge-cases here worthy of deliberation). I would much rather someone who is thinking of violence to come here, and be met with genuine and real arguments, instead of being driven into the shadows while feeling like people are censoring any discussion of this leaving them with no choice but to make up their own mind, all on their own without any help, on this extremely difficult and high-stakes decision.
Discussing or advocating violence is not banned on LessWrong (though I would be surprised if it isn’t met with very consistent opposition in practically all cases). This also doesn’t mean that all discussion of violence is permitted. If you are being a dick, or are causing discussions to go off the rails, all the usual moderation rules apply, on all sides of any discussions here.
Why do you think this? I’m skeptical this is true, especially if you’re including non-technical talent.
IDK, I counted them? I made some spreadsheets over the years, and ran this number by a bunch of other people, and my current guess is that it’s around 55%? When I list organizations with full-time employees working in safety I actually end up at substantially above 50% of people working at Anthropic, but I think that’s overcounting.
My sense is that Anthropic leadership has very different views from most AI safety EAs.
I think there are differences and overlaps. I think Rob points to a thing that is shared across a cluster that spans both of them, and has historically had a lot of influence.
No, the thing that seems unlikely is someone hacking us and then broadcasting your DMs to the world. As Robert says in the OP, attacks where someone uses any credentials or crypto-wallet passwords or API keys you sent in your DMs seem more likely than that, but I don’t think attackers would try to hack LessWrong to publish all the DMs. It’s not that juicy, it’s still pretty legally risky, and I expect things to scale more than that.
I think some of the people who are best at thinking independently about stuff, and are pretty good at not getting swept up in the power-seeking stuff, work at Open Phil. I think Holden genuinely helped with some of the correct cultural pieces, and my current belief is that if he wasn’t under the most pressure that anyone is, that he would probably have a relatively sane relationship to Anthropic as a result of it, though I am not as confident I am about that as I am that he had a bunch of quite good cultural pieces that help people be less naively power-seeking here.
I think you’re right, and also it seems misleading / like a bad clustering to lump “the EAs” in with “Anthropic’s leadership”. I think those groups have some memetic connections, but they’re not the same group!
More than 50% of the talent-weighted safety people in EA are literally employees of Anthropic! The ex-CEO of Open Phil now works at Anthropic, and is married to one of its founders. These groups have enormous overlap.
Like, there is so enormous overlap, and the overlap results in such an enormous amount of de-facto deference (being an employee of a company is approximately the strongest common deference relationship we have) that it makes sense to think of these as closely intertwined.
Yes, there are people who attach the EA label themselves who are different here, sometimes even quite substantial clusters. But it’s also IMO clear from Scott’s response that he himself is also majorly deferring and is majorly supportive of Anthropic as a representative of EA, so this clearly isn’t just a split between “everyone who works at Anthropic and everyone who doesn’t”.
Rob used “Open Phil” exactly two times. One time saying “a cluster of Dario and Open-Phil-ish people” and another time “EAs / Open Phil” in reference to the broader community that includes all of these things. These seem like totally reasonable ways of using these pointers and words. I don’t have anything better. It’s definitely not “just Anthropic” as I think Scott very unambiguously demonstrates, and it would be of course extremely confusing to refer to Scott as “Anthropic”.
My read of the social dynamics is that in places where people are inclined to defer to me or people like me, they might initially approve of the Scott thing for bad tribal reasons, but change their mind when they read criticism of it from me or someone like me
Well, I mean, that is a hard conditional to be false since if people were to not change their mind, this would largely invalidate the premise that they are declined to defer to you. Unfortunately, I both think the vast majority of places in EA do not defer to you or people like you, and furthermore, I also think you are pretty importantly wrong about your criticisms, so I don’t quite know how to feel about this.
I do think it helps and am marginally happy about your cultural influence here (though it’s tricky, I also think a bunch of your takes here are quite dumb). I think the vast majority of the cultural influence here is downstream of not quite anyone in-particular, but more Anthropic than anywhere else, and neither you nor me can change that very much.
I think that Scott’s post would not overall be received positively by those people.
Yeah, I expect it to be straightforwardly positively received. I think people will be like “some parts of this seem dumb, the Ilya thing in-particular, but yeah, fuck those rationalists and MIRI people, I am with Scott on that”.
To be clear, I am not expecting consensus here, I think this will be what 75% of people who have any opinion at all on anything adjacent on this believe, but I expect people would broadly think it’s a good contribution that properly establishes norms and reflects how they think about things.
I also think it’s plausible people would be like “wow, what an uncough way that both of these people are interfacing with each other, please get away from each other children”, but then actually if you talked to them afterwards, they would be like “yeah, I mean, that was a bit of a shitshow but I do think Scott was basically right here (minus 1-2 minor things)”.
I am not enormously confident on this, but it matches my experiences of the space.
what are these people even arguing about?
Among other things, the fact that one of the leading ASI lab is substantially downstream of us. Separately, a lot of real actual politics that tends to happen in the community around prestige and money and talent allocation and respect, which needs to get litigated somehow (and abuse of power and legitimacy is common and if you can’t talk about it you can’t have norms about it).
It’s a really useful pointer towards a tactic that is relatively widespread and has no better word. I am personally happy to use other words, but I have the sense that sentences like “I am so very very tired of the ambiguous but ultimately strategic enough attempts at undermining my ability to orient in this situation by denying pretty clearly true parts of reality combined with intense implicit threats of consequences if I indicate I believe the wrong thing that might or might not be conscious optimizations happening in my interlocutors but have enough long-term coherence to be extremely unlikely to be the cause of random misunderstandings” would work that well.
Last we spoke you were talking about API or command line integration which would in principle allow a very wide range of editing/importing workflows, at least for power users.
That is now there! It’s what powers our LLM integrations:
A lot of Rob’s complaints are about things that happened in the past, so I don’t think it’s crazy to interpret him as talking about people who worked at CG in the past.
Fair enough. I think that the people you list also used to believe things closer to what Rob is saying in the past, so at least we need to do a consistent comparison. Holden from 10 years ago seems to say a lot of the things that Rob is saying here, and Ajeya from a few years ago also said things more like this (more point 1 and 3, less point 2).
My guess is that it is worth digging up quotes here, but it’s a lot of work, so I am not going to do it for now, but if it turns out to be cruxy, I can.
(Again, I don’t think these are centrally the people Rob is talking about in either case. I think centrally he is talking about Anthropic, and then secondarily talking about how Open Phil people have related to Anthropic over the years, but I do still think his criticism is correct directionally for those people)
I don’t think Alexander thinks “AI is going to become vastly superhuman in the near future” (and fwiw I don’t think Dario thinks that either, he doesn’t seem to believe in returns to intelligence substantially above human-level).
I think Alexander abstractly believes that AI could very well become vastly superhuman in the near future, but yes, similar to Dario does not believe that speculating about such a thing in a non-scientific non-empirical way is appropriate, and as such they do not have coherent beliefs about this. Indeed, it seems like really a quite central match to what Rob is saying.
But aren’t Alexander Berger’s views not very relevant about OpenPhil’s AI strategy decisions from many years ago when their AI strategy and worldview—which I take to be very cose to the things Rob was criticizing—were worked out and started shaping the views of EAs in OpenPhil’s orbit?
I think Holden at the time believed something closer to what Rob says here (though it’s still not an amazing fit), and more generally, I think “the beliefs of the successor CEO” are actually a better proxy for “the vibes of the broader ecosystem you are part of” than “the beliefs of the founder CEO”. I could go into more detail on my beliefs on this, though I think the argument is reasonably intuitive.
but I think of these as quite distinct things and I never assumed that the thing with Dustin M. had anything to do with OpenPhil’s AI strategy decisions in (say) five years ago
Yep, I think they are highly related. Indeed, I was predicting things like the Dustin thing without any knowledge of Dustin’s specific beliefs, and my predictions were primarily downstream of seeing how Anthropic’s position within the ecosystem was changing, and a broader belief-system that I think is shared by many people in leadership, not just Dustin.
I have since updated that more people who are a level below Alexander, Dustin and Dario have more reasonable beliefs, but also updated that those things end up mattering surprisingly little for what actually ends up a strategic priority.
I just want to note that it’s not obvious how much these things are connected/caused by one “OpenPhil culture,” vs being about distinct things. (I think some of these are maybe directionally accurate as criticism, btw.)
I think the “OpenPhil culture” thing is a distraction. In my model of the world most of this is downstream of people being into power-seeking strategies mostly from a naive-consequentialist lens, which is not that unique to OpenPhil within EA (and if anything OpenPhil has some of the people with the best antibodies to this, though also a lot of people who think very centrally along these lines, more concentrated among current leadership).
That’s also what your brain is doing when you say you don’t want to work on this anymore. Scott doesn’t want you to quit! (Partially because he values Lightcone’s work, and partially because it would look bad for him if you can publicly blame your burnout on him.) Crucially, your brain knows this.
Man, I really wish this was the case, and it’s non-zero of what is going on, but the vast majority of what I am expressing with my (genuine) desire to quit is the stress and frustration associated with the gaslighting, which is one level more abstract than the issue you talk about.
Like yes, there is a threat here being like “for fuck’s sake, stop gaslighting or I am genuinely going to blow up my part of the pie”, but it’s not actually about the object level, and I don’t actually have much of any genuine hope of that working in the same way one might expect from a negotiation tactic.
I am just genuinely actually very tired, and Scott changing his mind on this and going “oh yeah, actually you are right” actually wouldn’t do much to make me want to not quit, because it wouldn’t address the continuous gaslighting where every time anyone tries to talk about any of the adversarial dynamics, they immediately get told this is all made up and get repeated “I haven’t seen EAs (other than SBF) do a lot of lying, equivocating, or even being particularly shy about their beliefs” and “everyone is being honest all the time and actually it’s just you who is lying right now and always”.
Yep, my current plan is to completely fade out the Markdown editor (it only historically existed because mobile editor support has been lacking).
And then I want to just have a “import markdown” /-command, which you can use to import Markdown, wherever you like, plus an “copy as markdown” selection-menu item so you can copy any text as markdown.
I think that will just be the less error-prone system.
I think that the people in the ecosystem you’re criticizing would not approve of Scott’s post.
I think this is false. I expect Scott’s post to be heavily upvoted, if it was posted to the EA Forum to have an enormously positive agree/disagree ratio, and in-general for people to believe something pretty close to it.
There are a few exceptions (somewhat ironically a good chunk of the cG AI-risk people), but they would be relatively sparse. I think this is roughly what someone who is smart, but doesn’t have a strong inside-view take about what they should do about AI-risk believes that they should act like if they want to be a good member of the EA community. My guess is it’s also pretty close to what leadership at cG, CEA and Anthropic believe, plus it would poll pretty well at a thing like SES.
He’s just a guy in it, and one who isn’t even that closely connected to Anthropic or Coefficient Giving people.
The issue is of course not that Scott is right or wrong about what Anthropic or cG people believe. The issue is that he seems to be taking a view where you should be super strategic in your communications, sneer at anyone who is open about things, and measure your success in how many of your friends are now at the levers of power.
I think this is not a good summary of what Coefficient Giving has done.
I think cG’s funding decisions were really very centrally about trying to punish people who weren’t being strategic in their communications in the way that Dustin wanted them to be strategic in their communication’s.
I think other “all kinds of complicated adversarial shit” has also happened, though it’s harder to point to. At a minimum I will point to the fact that invitation decisions to things like SES have followed similar adversarial “you aren’t cooperating with our strategic communications” principles.
In general, I’ll note that I don’t think Rob really knows many of the OP people; I suspect he has spent <40 hours talking to them about any of this possibly ever.
I think you are overfitting Rob’s post to be about the wrong people. I think it’s much closer to accurate, if you actually read what he says, which is:
Dario and a cluster of Open-Phil-ish people
I think the things Rob is saying still have some strawman-y nature to them, but I think they are reasonably accurate descriptors of Anthropic leadership, plus my best guesses of what Alexander (head of Coefficient Giving) and Zach (head of CEA) believe, which seems well-described by “Dario and a cluster of Open-Phil-ish people”, and furthermore also of course constitutes an enormous fraction of the authority over broader EA.
I feel like almost all of your comment is just running with that misunderstanding and hence mostly irrelevant.
As you say yourself, almost no one in your list works at cG, or is in any meaningful position of authority at cG, so this feels like a bit of an absurd interpretation (I think trying to apply the things he is saying to Holden is reasonable, given Holden’s historical role in cG, and I do think he in the distant past said things much closer to this, but seems to have changed tack sometime in the past few years).
Honestly, this is such a bad reply by Scott that I… don’t quite know whether I want to work on all of this anymore.
If this is how this ecosystem wants to treat people trying their hardest to communicate openly about the risks, and who are trying to somehow make sense of the real adversarial pressures they are facing, then I don’t think I want anything to do with it.
I have issues with Rob’s top-level tweet. I think it gets some things wrong, but it points at a real dynamic. It’s kind of strawman-y about things, and this makes some of Scott’s reaction more understandable, but his response overall seems enormously disproportionate.
Scott’s response is extremely emblematic of what I’ve experienced in the space. Simultaneous extreme insults and obviously bad faith arguments (“actually, it’s your fault that Deepmind was founded because you weren’t careful enough with your comms”), and then gaslighting that no one faces any censure for being open about these things (despite the very thing you are reading being extremely aggro about the lack of strategic communication), and actually we should be happy that Ilya started another ASI lab, and that Jan Leike has some compute budget.
The whole “no you are actually responsible for Deepmind” thing, in a tweet defending that it’s great that all of our resources are going into Anthropic, is just totally absurd. I don’t know what is going on with Scott here, but this is clearly not a high-quality response.
Copying my replies from Twitter, but I am also seriously considering making this my last day. It’s not the kind of decision to be made at 5AM in the morning so who knows, but seriously, fuck this.
IMO this doesn’t seem like the kind of response you will endorse in a few days, especially the “You are responsible for Deepmind/OpenAI” part.
You were also talking about AI close to the same time, and you’ve historically been pretty principled about this kind of stance.
you could argue that you’re not against strategicness in general, just talking about this one issue of saying cleanly that AI is very dangerous.
Robby at least has been very consistent on this that he is against most forms of strategic communication in general.
I also think you are against many forms of strategic communication in general? Your writing explores many of the relevant considerations in a lot of depth, and you certainly have not shied away from sharing your opinion on controversial issues, even when it wasn’t super clear how that is going to help things.
I think you are just arguing the wrong side of this specific argument branch. My model of Eliezer, Nate and Robby all have been pretty consistent that being overly strategic in conversation usually backfires. Of course you shouldn’t have no strategy, and my model of Eliezer in-particular has been in the past too strategic for my tastes and so might disagree with this, but I am pretty confident Robby himself is just pretty solidly on the “it’s good to blurt out what you believe, *especially* if you don’t have any good confident inside view model about how to make things better”.
In exchange, we would have won a couple more years of timeline, which would have been pointless, because timeline isn’t measured in distance from the year 1 AD, it’s measured in distance between some level of woken-up-ness and some point of danger, and the woken-up-ness would be pushed forward at the same rate the danger was.
I feel like we both know this is a strawman. The key thing at least in recent years that Rob, Eliezer and Nate have been arguing for is the political machinery necessary to actually control how fast you are building ASI, and the ability to stop for many years at a time, and to only proceed when risks actually seem handled.
If anything, Eliezer, Nate and Robby have been actively trying to move political will from “a pause right now” to “the machinery for a genuine stop”.
This makes this comparison just weird. Yes, according to everyone’s models the only time you might have the political will to stop will be in the future. I have never seen Nate or Eliezer or Robby say that they expect to get a stop tomorrow. But they of course also know that getting in a position to stop takes a long time, and the right time to get started on that work was yesterday.
So if they had their way (with their present selves teleported back in time) is that we would have more draft treaties, more negotiation between the U.S. and China. More materials ready to hand congress people who are trying to grapple with all of this stuff. Essays and books and movies and videos explaining the AI existential risk case straightforwardly to every audience imaginable.
That is what you could do if you took the 200+ risk-concerned people who ended up instead going to work at Anthropic, or ended up trying to play various inside-game politics things at OpenAI.
And man, I don’t know, but that just seems like a much better world. Maybe you disagree, which is fine, but please don’t create a strawman where Robby or Nate or Eliezer were ever really centrally angling for a short-termed pause that would have already passed by-then.
And then even beyond that, I think if you don’t know how to solve a problem, I think it is generally the virtuous thing to help other people get more surface area on solving it. Buying more time is the best way to do that, especially buying time now when the risks are pretty intuitive. I think you believe this too, and I don’t really know what’s going with your reaction here.
But my impression is that the rest of the field is executing this portfolio plan admirably, but MIRI and a few other PauseAI people are trying to sabotage every other strategy in the portfolio in the hope of forcing people into theirs.
Come on man, a huge number of people we both respect have recently updated that the kind of direct advocacy that MIRI has been doing has been massively under-invested in. I do not think that “other people are executing this portfolio plan admirably”, and this is just such a huge mischaracterization of the dynamics of this situation that I don’t know where to start.
“If Anyone Builds It, Everyone Dies” is a straightforward book. It doesn’t try to sabotage every other strategy in the portfolio, and I have no idea how you could characterize really any of the media appearances of Nate this way.
This is of course in contrast to Open Phil defunding almost everyone who has been pursuing this strategy and making mine and tons of other people’s lives hell, and all kinds of complicated adversarial shit that I’ve been having to deal with for years, where absolutely there have been tons of attempts to sabotage people trying to pursue strategies like this.
Like man, we can maybe argue about the magnitude of the errors here, and the sabotage or whatever, but trying to characterize this as some kind of “Nate, Eliezer, Robby are defecting on other people trying to be purely cooperative” seems absurd to me. I am really confused what is going on here.
We wouldn’t have the head of the leading AI lab writing letters to policymakers begging them to “jolt awake”, we wouldn’t have a substantial fraction of world compute going to Jan Leike’s alignment efforts, we wouldn’t have Ilya sitting on $50 billion for some super-secret alignment project
I am sympathetic to the first of these (but disagree you are characterizing Dario here correctly).
But come on, clearly Ilya sitting on $50 billion for starting another ASI company is not good news for the world. I don’t think you believe that this is actually a real ray of hope.
(And then I also don’t think that Jan Leike having marginally more compute is going to help, but maybe there is a more real disagreement here)
Overall, I am so so so tired of the gaslighting here.
It’s a user-setting, not a thing in the editor itself:
I think the strongest argument here is that Anthropic themselves refer to the section of the RSP that says they have to do a risk report when they “publicly deploy” a model, when they talk about why they are releasing the current risk report:
And if we release a model that is “significantly more capable” than those discussed in the prior Risk Report, we must “publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report.”
“significantly more capable” is a quote from this paragraph:
When we publicly deploy a model that we determine is significantly more capable than any of the models covered in the most recent Risk Report, we will publish a discussion (in our System Card or elsewhere) of how that model’s capabilities and propensities affect or change analysis in the Risk Report.
It’s not like a perfectly airtight case, but it seems to me that Anthropic is saying in the first paragraph that they are considering the Mythos release to be the kind of thing that would trigger the second paragraph, which would be a “public deployment”.
I agree the common-sense reading of “public deployment” could reasonably not apply to the present situation (though it’s IMO a bit of a stretch), but I think given these paragraphs, it seems like Anthropic themselves think it met the relevant threshold.
Your published post version had the widget inside of a collapsable block. Unfortunately I think that was flying to close to the sun for our editor tooling and did indeed not actually render, so I moved the widget out of the collapsible block, thinking that you probably prefer that over not being to access it at all.