Bernie Sanders has released a video on x-risk featuring a discussion with Eliezer, Nate, Daniel Kokotajlo, and Jeffrey Ladish. An excerpt from it appears to be blowing up on twitter.
EDIT: re ‘blowing up’: initial view velocity was really high (~200k/hour), but steeply dropped off and it doesn’t seem to have really broken containment
Depends on the reference class. As of the past year or so, 1m eyes on a piece of AI safety content isn’t crazy, especially for a video on Twitter, where my impression is the criteria for what counts as a ‘view’ are pretty liberal. Like, plausibly the video has been viewed in full (much) less than half that many times.
Separately, videos posted by that account seem to routinely get ~1m views—not outperforming other content from an external collaborator is a little disappointing from a raw metrics perspective! Naively you’d hope to get the combined weight of your respective audiences, which seems to have only somewhat happened here.
When I posted this I think I expected we’d get to 2m views in the first couple days (weakly outperforming other Sanders Twitter content). I think with a different video, that could have happened.
Separately, videos posted by that account seem to routinely get ~1m views—not outperforming other content from an external collaborator is a little disappointing from a raw metrics perspective! Naively you’d hope to get the combined weight of your respective audiences, which seems to have only somewhat happened here.
Huh? Eliezer’s main account tens to get a few ten thousands of views. So under the combination of respective audiences theory, it’s… mostly the audience of Bernie Sanders! Surely that breaks containment.
Also, looking at Bernie’s account suggests that it’s in the higher view count class of his posts—some get 2M but others get 100K and if you go by eyeballed average it seems to be 500k.
I was comparing to a broader activation of Eliezer’s audience vs any given tweet.
Outperforming the ‘average’ is the wrong standard for ‘blowing up’. ‘Blowing up’ would be ‘outperforming all the recent similar artifacts’, at least as I intended it in my original post.
Meta: it feels pretty strange to have used an underspecified colloquial term, to have walked back the applicability of that term as I intended to use it, and then to be told I’m wrong for walking it back. The point I cared about capturing in that edit is ‘this tweet didn’t do as well as I expected when it first dropped.’ That’s a claim about my own expectations.
The point I care about is whether it’s gotten to a substantial audience of ‘normal’ people. I had read you as making claims that that hadn’t really happened; yet it looks to me that, to the extent that twitter’s viewcount is reasonable (though I believe you that it probably isn’t), a million people who are mostly Bernie Sanders fans got exposed to the video. If you only meant that this is less than what you expected, then that’s my bad and we don’t have any disagreement.
New reacts available only to paid users of LessWrong Premium (not you freeloaders) facilitate frictionless, borderline-telepathic communication.
‘I will NEVER change my mind’: Use this react to assert that you’re content with exactly how wrong you are (which is not at all), and that the case is permanently closed on this matter, so far as you’re concerned.1
‘EY Stamp of Approval’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky agrees with the contents of the comment, rendering it beyond reproach.
‘NOT EY Approved’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky disagrees with the contents of the comment, rendering it immensely reproachful. Users who accrue too many ‘NOT EY Approved’ reacts will have their accounts suspended (although actual thresholds here have yet to be set).2
‘May as well be AI’: Use this react when you’re indifferent to whether or not a statement was generated by AI because shit, it may as well be. You’re ignoring it either way.
‘China Hawk’: Use this react to assert American supremacy and insist that its recipient is derelict in their duty to ensure the preeminence of the greatest country in the history of conscious life from here to the other side of the singularity.
’Toilet’3
‘Sure, buddy’: We all know that the optimal amount of mental masturbation is non-zero. But some reach far beyond the zone of optimality and into the depths of their pants to produce truly monstrous works of self-gratification. Previously, one was compelled to express such an opinion obliquely on LessWrong, or else shatter the decorum of the space and open oneself up to similar critiques. In Beta, we now have the power to quietly acknowledge the reality of the situation, without derailing the gratification itself.
———————————————————————————————
1This has replaced the ‘I beseech thee’ emoji, which never worked anyway. 2 Note that both EY-invoking reacts are invisible to anyone logged into the @-EliezerYudkowsky LessWrong account, or from any IP address that has ever been logged into that account. No point in having the man himself weigh in when so many LessWrong users are so well practiced at speaking on his behalf! 3Beta users haven’t settled on how this react ought to be deployed. I’ve seen utilizations ranging from ‘This post belongs in the toilet’ to ‘I enjoyed reading this on the toilet.’
Disappointed that they left out these—I really don’t like choosing between being technically correct and politically correct. Maybe they can go in a LessWrong Max tier.
I will not do this and nobody else should either, but I think in a world where the negative effects were magically not present, it would be quite funny to react to all EY comments with “not EY approved”
Edit: also no way to pay lw has been forthcoming, making the thing I’m replying to effectively only an April fools lie
Pete Hegseth just declared Anthropic a supply chain risk.
Under the most expansive naive interpretation of what this might mean, it seems plausible that this bottlenecks their compute access (e.g., through Amazon, who has government contracts).
Does anyone know more here / has thought more about this? I’d be pretty surprised if the most expansive naive interpretation held, but it’s also probably not the most narrow one (e.g., can’t work with the defense industry, or on government-specific contracts in larger companies).
That so many companies (the majority of Fortune 500s, for instance) contract with the government seems to make this dramatically impactful (e.g. some-double-digit-percentage of B2B revenue; but is it 10 percent or 90? How much does it impact other aspects of their operation? I just don’t know).
Edit: An important detail here, not obvious from the initial post, is that ‘Hegseth makes a tweet’ is kind of a ‘step 0’ to the actual process unfolding. We actually don’t know if he’s really placed the order, if the order will be followed through on, and with what level of diligence/brutality (it’s this last question that my original post was asking, but in some importance sense ‘we’re just not there yet’).
Not to be overly political (although how do you avoid it with a topic like this?) but I think it’s just another part of “MAGA maoism” as it’s been called. The idea of the Republican party as a small government free market pro liberty group died with Trump, who painted over that whole philosophy with his focus on culture war grievances, state control, and “owning the libs”.
Matt Yglesias had a good piece two years ago about how the tariff regime essentially makes a mini central planning department and we’ve seen Trump expressly use it as such, trying to bend the economy to attack companies and nations over personal grievances.
And Scott Lincicome of the CATO institute has covered this new brand of “state corporatism” as he calls it quite well IMO, about the nationalization of various companies and the various market distortions it creates.
“First they came” is a famous poem about the oppression of people, but it’s also a good general warning for all sorts of behavior. What someone will happily do to another is what they can happily do to you if you ever cross them. Whether that be the chick who cheated on her husband for you cheating on your new relationship, stuffing people into camps, or using the full force of government to smash your company into bits if you aren’t compliant.
There’s likely to be more of this behavior coming in the future. They’re trying to unlock themselves a new power, and what hammer doesn’t seek out a nail?
“First they came” is a famous poem about the oppression of people
The versions now in use go something like
First they came for the Nazis,
and I cheered, because fuck Nazis.
And everything worked out great!
and
First they came for us,
and now we come for them.
Fuck Around and Find Out.
Neither of these are really in keeping with the spirit of the original poem you reference, but insofar as there is such a thing as the Moral Law, I’d say the second is much more in keeping with it. As the Bible says, Vengeance is Divine.
I work for $BIGTECH$ and they’ve spent quite a lot of effort and money (multiples of tens of millions by my own back of the napkin estimation) on getting everyone, and I mean everyone, onto Claude Code. I rack up a $100 bill daily and I know guys who’ll do 10x that. We’re in the narrow sense a government/defense contractor, and I would imagine the guys at the top are going to be extremely angry when this whole initiative they’ve spent so much on gets stonewalled by a decision that could be generously construed as “arbitrary” or “short-sighted” as a whole. If the expansive interpretation holds, I would find myself surprised if Anthropic is alive in any meaningful capacity by Christmas, but I wouldn’t rule out a purposefully narrowed interpretation—there’s quite a lot of institutional money backing Anthropic, 4.6-opus is the current state of the art based solely on METR time horizons (admittedly no data yet for Gemini 3.1-pro) and stripping large tech companies of access to cutting edge tools would quickly make enemies of a large amount of very influential and wealthy industrialists—maybe not the wisest move politically.
As I understand it, the “supply chain risk” only restricts the target in one direction. Amazon can still sell to Anthropic and keep its government contracts.
I expect this is true in the simple case of ‘you can sell a service to them’; the reason I used the Amazon example, is that Amazon owns shares in Anthropic ($60B worth according to quick Claude check), which it purchased (at least in part) with compute credits; this is a much more intimate entanglement then Anthropic being yet-another-AWS-customer.
Again, I genuinely don’t know how this pans out, but the crux for me is not ‘can you sell a product to a company that’s a supply chain risk and keep your contracts’; it’s ‘can you [do all the things Amazon is doing with Anthropic, which are largely mutually conditioned on one another as part of complex agreements] with a supply chain risk and keep your contracts.’
MIRI is potentially interested in supporting reading groups for If Anyone Builds It, Everyone Dies by offering study questions, facilitation, and / or copies of the book, at our discretion. If you lead a pre-existing reading group of some kind (or meetup group that occasionally reads things together), please fill out this form.
The deadline to submit is September 22, but sooner is much better.
In two days: MIRI’s hosting a virtual event for those who preorder Nate and Eliezer’s forthcoming book If Anyone Builds It, Everyone Dies. Sunday, August 10 at noon PT. It’s a chat between Nate Soares and Tim Urban (Wait But Why) followed by a Q&A.
This will be followed by another event in September, a full Q+A w/ Nate + Eliezer.
Meanings of political identities shift dramatically based on context, and you can’t manually confirm the beliefs of everyone present at your ‘gathering of people with x political identity’. To the extent that your political identity is based on Real Beliefs with Real Consequences, you should expect not to have much in common with many other people who declare the same identity when you move to a new place (or corner of the internet).
Example: In rural Southeast Texas, Confederate flags are a common sight, and my geometry teacher once told us about a cross burning he witnessed (which a few students murmured we really ought to bring back).
The majority of people genuinely hold at least one belief that, to many of my coastal-elite-descended friends, would seem comical. E.g., women should never have jobs and should rarely speak (especially in public), men with long hair are wanton or gay or trans or both, beating children (not like ‘spanking’ but like ‘anything short of broken bones’) is not only fine but your duty as afather, weed overdose not only can but will definitely kill you, megadoses of zinc can cure cancer, the covid vaccine is the mark of the beast from the book of revelation, high school football ought to be the most important thing in your life and, if it isn’t, you are not just odd but untrustworthy, and abortion doctors force-feed fetuses to geese to make fois gras for gay New Yorkers (of which the force feeding is the only ethical component).
Okay, I made up the last one, but the rest are actual positions I’ve heard espoused hundreds or thousands of times by people I met between the ages of 14 and 18.
Also many people talk like this, and everyone’s a ‘libertarian’.
My mom’s from a conservative California family with environmentalist sympathies, and we had something like 60 percent overlap in our views prior to the Texas move. However, I soon found that everyone around who wasn’t liable to drop one of these devastating truth bombs on me thought of themselves as somewhere to the left of Bernie Sanders and read 20th century Marxist writings in their free time. Often these people would think leftist voices at the national scale were somewhat silly or focused on the wrong things (e.g. identity politics), but they nonetheless considered themselves closer to those views than to the other views present in their environs. (There was a democratic party around, but it was very different from the national democratic party for reasons I won’t address here.)
I assume most readers are in an environment more similar to my current environment (Berkeley, CA) than to Lumberton, TX, and so won’t explicate the delta.
I think there’s a lot of mind-killing that happens as a result of relying on a presumed shared vocabulary for political identities that does not exist. When I say something left-coded, my Rationalist Libertarian Interlocutor often reproaches me, and then as we talk more about it, they often conclude that I’m a ‘boring centrist like everybody else’ who uses the language of the left owing to some biographical quirk.
I submit instead that everyone’s sense of the political map is hopelessly warped due to biographical quirks, and that assuming an adversarial posture on the basis of someone’s declared political identity is often, and maybe even ~always, a mistake.
I think there’s a lot of mind-killing that happens as a result of relying on a presumed shared vocabulary for political identities that does not exist. When I say something left-coded, my Rationalist Libertarian Interlocutor often reproaches me, and then as we talk more about it, they often conclude that I’m a ‘boring centrist like everybody else’ who uses the language of the left owing to some biographical quirk.
I submit that the problem you are encountering is operating too much on simulacra level 3.
I honestly don’t give a rats ass what your “political identity” is. I care about your beliefs about the effects of different policies, your current beliefs about the world, and most of all your reasons for believing those things. Insofar as your “reasons” are in fact you self-modeling as “leftist”, “centrist”, “libertarian”, or some other label like that, you are clearly operating on simulacra level 3, and I stop caring.
That is to say that insofar as the majority of people have one or more of the items on that list of absurd beliefs you gave because they genuinely believe such things, they are dumb, and the only reason I’d care what they have to say is some morbid curiosity, or of course to argue against them (or both).
Insofar as the majority of people have one or more of the items on that list of absurd beliefs you gave because we simply lack a shared vocabulary for political identities, I again don’t care what they have to say. If they are actually trying to communicate to me that they think of themselves as a centrist, well, that claim has no impact on reality whatsoever, so who cares?
The same goes for people with stupid claims which read leftist.
The thing is that we do have shared vocabulary for things in the real world (excepting many technical terms, but one can just use Wikipedia or Claude for those), and there are not active social wars over what those real world words should mean (that laypeople ought to care about).
If you stick to simply precisely stating your beliefs and values (and reasons for having them), not only will you avoid the upsetting result of having others believe you are The Dreadded Outgoup!!! but you will also avoid meaningless conversations about which Hogwarts house political identity you are.
You wrote this comment in an adversarial tone but I Just Agree With You.
Indeed, this is an alternate formulation of the thesis of my post, and even uses language I used when characterizing the post itself to someone in the office ~2 hours ago.
Hm, I think the thing I disagree with, and was trying to argue against was that it is a good idea in the first place to even try to have a political map at all, nevermind to try to unwarp it.
The post is meant to be somewhat agnostic on the question—conditional on one has a map, here’s a common failure mode. It’s also meant to point in the direction of ‘reconsider the value of your map’.
Separately, I think I ~endorse your first comment, but I also think there are cases in which you should definitely have a map (eg you are attempting to achieve political ends). So I think your second comment is somewhat overstated.
I submit instead that everyone’s sense of the political map is hopelessly warped due to biographical quirks, and that assuming an adversarial posture on the basis of someone’s declared political identity is often, and maybe even ~always, a mistake.
The rest of the comment made sense but this feels sorta non-sequitor-ial. Maybe this is right, but, most of the things you said seemed like on average it would increase the amount I expected some kind of adversarial posture to make sense. (Except insofar as adversarial postures are basically not a good strategy, in which case it still doesn’t depend much on the rest of the contents of the quick take)
Just said to someone that I would by default read anything they wrote to me in a positive light, and that if they wanted to be mean to me in text, they should put ‘(mean)’ after the statement.
Then realized that, if I had to put ‘(mean)’ after everything I wrote on the internet that I wanted to read as slightly abrupt or confrontational, I would definitely be abrupt and confrontational on the internet less.
I am somewhat more confrontational than I endorse, and having to actually say to myself and the world that I was intending to be harsh, rather than simply demonstrating it, would lead to me being harsh somewhat less often.
I do basically know when I’m being confrontational in public, and am Doing It On Purpose, and also expect others to know it, and almost never say “I wasn’t being confrontational” when I was being confrontational, and somehow having to signal it more overtly than it is already (that is to say, very) signaled would encourage me to tone it down.
In person, I even sometimes say ‘I am being difficult in a way that I endorse/feels correct/feels important’ and just go on being difficult, and considering that I’m about to do that in advance doesn’t discourage me from being difficult.
Might start running the internal simulation ‘you have to put a ‘(mean)’ tag on something when you’re intending to be a little mean on the internet’ to see how it changes things.
(I am not going to defend the position ‘sometimes it is correct to be a little mean’; I’m more interested in the other phenomenon.)
Just said to someone that I would by default read anything they wrote to me in a positive light
What a mean thing to say. (If you commit to perceive positivity regardless of its presence, you thereby commit to ignore any real positivity the other person intended to convey.)
Putting ‘(mean)’ at the end of a statement is the exact mechanism of saying ‘no’ that is (jokingly!) offered in this scenario, to (again, jokingly) address the exact thing you are describing, with the exact post you have linked as the referent which appeared in my mind as I made the joke! The joke being ‘that is obviously not a sufficient mechanism to solve the problem of offering a real valence, thereby robbing my interlocutor of some agency/expressive power.’ You are making the same point I am making as though you are correcting me.
You are repeating my own joke back to me as if you originated it, and ignoring the substantive reflective point that I thought might be of value (which does not itself importantly hinge on the original Garrabrant-adjacent context).
I guess you didn’t intend lack of valence or unintended negativity as relevant distinct possibilities, so that lack of the “(mean)” tag would be synonymous with “(positive)” in the intended reading. Without this assumption, you are leaving the other person without a method for communicating positivity, the protocol only supports expressing either negativity or an undifferentiated mixture of positivity and lack of valence, preventing communication of the distinction between positivity and lack of valence.
So the workaround more directly makes the point about still needing to communicate negativity (or its lack), not positivity, and I think the latter is the more curious part of the implication. For a statement of committing to see certain things in a positive light, this implication of its literal meaning conveys the opposite of the way this kind of sentiment is usually intended.
There’s no sharp line between Doing It On Purpose and not.
Being explicit about it makes it more on purpose, and engages more System 2 careful sequential thought, that clarifies whether you want to do this or not.
I think the phenomenon you’re describing is downstream of this common error in thinking about people.
I think it’s a very impactful error, both in reasoning about ourselves and others.
Yes exactly; I’m curious about how many more opportunities for greater intentionality/reflective-endorsement might be lurking in other areas, where I just haven’t created the right test/handle (but may believe that I’ve created the right one).
I’m also mindful of the opposite failure mode, though, where attempting to surface something to yourself internally actually causes you to over-index on it, leading to paralysis, where the thing was only present in very small doses and your threshold was poorly calibrated.
Please stop appealing to compute overhang. In a world where AI progress has wildly accelerated chip manufacture, this already-tenuous argument has become ~indefensible.
I tried to make a similar argument here, and I’m not sure it landed. I think the argument has since demonstrated even more predictive validity with e.g. the various attempts to build and restart nuclear power plants, directly motivated by nearby datacenter buildouts, on top of the obvious effects on chip production.
I’ve just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don’t know that they’ve been done elsewhere.
I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.
The straw-Pause rhetoric is something like “Just stop until safety catches up!” The overhang argument is usually deployed (as it is in those comments) to the effect of ‘there is no stopping.’ And yeah, in this calculation, there are in fact marginal negative externalities to the implementation of some subset of actions one might call a pause. The straw-Pause advocate really doesn’t want to look at that, because it’s messy to entertain counter-evidence to your position, especially if you don’t have a concrete enough proposal on the table to assign weights in the right places.
Because it’s so successful against straw-Pausers, the anti-pause people bring in the overhang argument like an absolute knockdown, when it’s actually just a footnote to double check the numbers and make sure your pause proposal avoids slipping into some arcane failure mode that ‘arms’ overhang scenarios. That it’s received as a knockdown is reinforced by the gearsiness of actually having numbers (and most of these conversations about pauses are happening in the abstract, in the absence of, i.e., draft policy).
But… just because your interlocutor doesn’t have the numbers at hand, doesn’t mean you can’t have a real conversation about the situations in which compute overhang takes on sufficient weight to upend the viability of a given pause proposal.
You said all of this much more elegantly here:
Arguments that overhangs are so bad that they outweigh the effects of pausing or slowing down are basically arguing that a second-order effect is more salient than the first-order effect. This is sometimes true, but before you’ve screened this consideration off by examining the object-level, I think your prior should be against.
...which feels to me like the most important part. The burden is on folks introducing an argument from overhang risk to prove its relevance within a specific conversation, rather than just introducing the adversely-gearsy concept to justify safety-coded accelerationism and/or profiteering. Everyone’s prior should be against actions Waluigi-ing, by default (while remaining alert to the possibility!).
Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.
I’m often tempted to comment this in various threads, but it feels like a rabbit hole, it’s not an easy one to convince someone of (because it’s an argument they’ve accepted for years), and I’ve had relatively little success talking about this with people in person (there’s some change I should make in how I’m talking about it, I think).
More broadly, I’ve started using quick takes to catalog random thoughts, because sometimes when I’m meeting someone for the first time, they have heard of me, and are mistaken about my beliefs, but would like to argue against their straw version. Having a public record I can point to of things I’ve thought feels useful for combatting this.
While I’m not a general fan of compute overhang, I do think that it’s at least somewhat relevant in worlds where AI pauses are very close to when a system is able to automate at least the entire AI R&D process, if not the entire AI economy itself, and I do suspect realistic pauses imposed by governments will likely only come once a massive amount of people lose their jobs, which can create incentives to go to algorithmic progress, and even small algorithmic progress might immediately blow up the pause agreement crafted in the aftermath of many people losing their jobs.
Basically, my statement in short terms is that conditional on AI pause happening because of massive job losses from AI that is barely unable to take-over the world, then even small saving in compute via better algorithms due to algorithmic research not being banned would incentivize more algorithmic research, which then lowers the compute enough to make the AI pause untenable and the AI takes over the world.
So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:
We are far and away most likely to get a pause only as a response to unemployment.
An AI that precipitates pause-inducing levels of unemployment is inches from automating AI R+D.
The period between implementing the pause and massive algorithmic advancements is long enough that we’re able to increase compute stock...
....but short enough that we’re not able to make meaningful safety progress before algorithmic advancements make the pause ineffective (because, i.e., we regulated FLOPS and it just now takes 100x fewer FLOPS to build the dangerous thing).
I think the conjunct probability of all these things is low, and I think their likelihood is sensitive to the terms of the pause agreement itself. I agree that the design of a pause should consider a broad range of possibilities, and try to maximize its own odds of attaining its ends (Keep Everyone Alive).
I’m also not sure how this goes better in the no-pause world? Unless this person also has really high odds on multipolar going well and expects some Savior AI trained and aligned in the same length of time as the effective window of the theoretical pause to intervene? But that’s a rare position among people who care about safety ~at all; it’s kind of a George Hotz take or something...
(I don’t think we disagree; you did flag that this as ”...somewhat relevant in worlds where...” which is often code for “I really don’t expect this to happen, but Someone Somewhere should hold this possibility in mind.” Just want to make sure I’m actually following!)
I think 1 and 2 are actually pretty likely, but 3 and 4 is where I’m a lot less confident in actually happening.
A big reason for this is that I suspect one of the reasons people aren’t reacting to AI progress is they assume it won’t take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there’s a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.
Cool! I think we’re in agreement at a high level. Thanks for taking the extra time to make sure you were understood.
In more detail, though:
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think ‘if we pause it will be for stupid reasons’ is a very sad take.
I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL’d researcher to RL the RL...) is intelligence-explosion fuel. But I think there’s a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven’t seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it’s not an explosion).
I take “depending on how concentrated AI R&D is” to foreshadow that you’d reply to the above with something like: “This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they’re unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that’s been RL’d by the...”
I think that’s right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker ‘for free’ as long as they keep advancing down the tech tree, but I don’t quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever’s selling this service, even if they’re basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as ‘bespoke’).
In general, I don’t like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don’t know what we’re going to get. ‘By the time we’ll have x, we’ll certainly have y’ is not a form of prediction that anyone has a particularly good track record making.
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think ‘if we pause it will be for stupid reasons’ is a very sad take.
I generally don’t think the Inconvenient truth movie mattered that much for solving climate change, compared to technological solutions like renewable energy, and made the issue a little more partisan (though environmentalism/climate change was unusually partisan by then) and I think social movements to affect AI already had less impact on AI safety than technical work (in a broad sense) for reducing doom, and I expect this trend to continue.
I think warning shots could scare the public, but I worry that the level of warning shots necessary to clear AI is in a fairly narrow band, and I also expect AI control to have a reasonable probability of containing human-level scheming models that do work, so I wouldn’t pick this at all.
I agree it’s a sad take that “if we pause it will be for stupid reasons”, but I also think this is the very likely attractor, if AI does become a subject that is salient in politics, because people hate nuance, and nuance matters way more than the average person wants to deal with on AI (For example, I think the second species argument critically misses important differences that make the human-AI relationship more friendly than the human-gorilla relationship, and that’s without the subject being politicized).
To address this:
But I think there’s a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven’t seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it’s not an explosion).
I think the key crux is I believe that the unreliability of GPT-4 would doom any attempt to automate 30% of jobs, and I think at most 0-1% of jobs could be automated, and while in principle you could improve reliability without improving capabilities too much, I also don’t think the incentives yet favor this option.
In general, I don’t like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don’t know what we’re going to get. ‘By the time we’ll have x, we’ll certainly have y’ is not a form of prediction that anyone has a particularly good track record making.
I agree with this sort of argument, and in general I am not a fan of collapsing checkpoints between today’s AI and God AIs, which is a big mistake I think MIRI did, but my main claim is that the checkpoints would be illegible enough to the average citizen such that they don’t notice the progress until it’s too late, and that the reliability improvements will in practice also be coupled with capabilities improvements that matter to the AI explosion, but not very visible to the average citizen for the reason Garrison Lovely describes here:
There’s a vibe that AI progress has stalled out in the last ~year, but I think it’s more accurate to say that progress has become increasingly illegible. Since 6⁄23, perf. on PhD level science questions went from barely better than random guessing to matching domain experts.
I think I get what you’re saying… That the argument you dislike is, “we should rush to AGI sooner, so that there’s less compute overhang when we get there.”
I agree that that argument is a pretty bad one. I personally think that we are already so far into a compute overhang regime that that ship has sailed. We are using very inefficient learning algorithms, and will be able to run millions of inference instances of any model we produce.
I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.
Overhangs, overhangs everywhere. A thousand gleaming threads stretching backwards from the fog of the Future, forwards from the static Past, and ending in a single Gordian knot before us here and now.
That knot: understanding, learning, being, thinking. The key, the source, the remaining barrier between us and the infinite, the unknowable, the singularity.
When will it break? What holds it steady? Each thread we examine seems so inadequate. Could this be what is holding us back, saving us from ourselves, from our Mind Children? Not this one, nor that, yet some strange mix of many compensating factors.
Surely, if we had more compute, we’d be there already? Or better data? The right algorithms? Faster hardware? Neuromorphic chips? Clever scaffolding? Training on a regress of chains of thought, to better solutions, to better chains of thought, to even better solutions?
All of these, and none of these. The web strains at the breaking point. How long now? Days? Months?
If we had enough ways to utilize inference-time compute, couldn’t we just scale that to super-genius, and ask the genius for a more efficient solution? But it doesn’t seem like that has been done. Has it been tried? Who can say.
Will the first AGI out the gate be so expensive it is unmaintainable for more than a few hours? Will it quickly find efficiency improvements?
Or will we again be bound, hung up on novel algorithmic insights hanging just out of sight. Who knows?
Surely though, surely.… surely rushing ahead into the danger cannot be the wisest course, the safest course? Can we not agree to take our time, to think through the puzzles that confront us, to enumerate possible consequences and proactively reduce risks?
I hope. I fear. I stare in awestruck wonder at our brilliance and stupidity so tightly intermingled. We place the barrel of the gun to our collective head, panting, desperate, asking ourselves if this is it. Will it be? Intelligence is dead, long live intelligence.
A group of researchers has released the Longitudinal Expert AI Panel, soliciting and collating forecasts regarding AI progress, adoption, and regulation from a large pool of both experts and non-experts.
it’s plausible to me that almost any public discussion of AI or AI safety that is not centrally about LLM consciousness should clarify this early and often.
Low-context audiences are really hung up on the consciousness topic, and are often reading entirely unrelated material as though it were trying to make a claim about consciousness, then generalizing to a judgement about the speaker that inoculates them against partitioning consciousness and capabilities.
Clarifying that you don’t mean to step in the consciousness discussion upfront may be a way to reduce instances of this (but could also backfire? I’m actually not confident in the solution; maybe the real thing is ‘if you’re discussing AI on the internet expect that someone with no context will show up and read it in this way, and do whatever makes sense to do to head that off’, but that seems much more costly and delicate a procedure than a simple disclaimer).
[inspired by recent twitter activity related to the Anthropic PSM paper]
Surely the most important distinction is that normal price discrimination is usually based on trying to infer a customer’s willingness to pay (based on how wealthy they are, how much they want the product, etc). Versus fare evasion is also heavily based in how willing someone is to lie / cheat / otherwise break the rules. So, tolerating fare evasion is a form of “price discrimination” that’s dramatically corrosive to societal trust and other values, much moreso than any normal kind of price discrimination—effectively a disproportionate tax on honorable law-abiding people. See Kelsey piper’s article https://www.theargumentmag.com/p/the-honesty-tax
“Dramatically corrosive to societal trust” feels like a wild overstatement. Fair evasion is and has been a norm (at least in the US) for approximately as long as public transit has existed, and it doesn’t seem like it’s meaningfully accelerated (except in periods of lax enforcement, which my guess is would reach an equilibrium somewhere) or meaningfully (let alone dramatically!) damaged social trust.
I buy parts of Kelsey’s argument, but I think there’s a bait and switch (at least between her opening example and the fare evasion example) where we start out talking about how people who can’t afford food are incentivized to lie and then end up talking about how poor people shouldn’t be allowed to use public transit. I claim the maxim should be sensitive to the context and would like to entertain the notion that services with near-zero marginal cost should tolerate some non-zero amount of fare evasion.
Non-zero, of course, in the same sense that credit card companies tolerate a non-zero amount of fraudulent transactions as part of doing business. And I agree that the right amount of resources to spend on fare enforcement will be different for each transit system, depending on all kinds of particular circumstances. But I would guess that for most transit systems, “let’s help the poor by cutting back on fare evasion” would overall be a much worse use of marginal resources than “let’s help the poor by offering free or discounted fare cards or giving them fare credits”, or perhaps “let’s help everyone by slightly expanding service frequency / coverage”.
Eg, grocery stores tolerate some amount of “shrinkage” (people stealing the food) according to whatever maximizes profits. If grocery stores were run by the government and were willing to run at a loss in order to promote greater overall social welfare, I wouldn’t support a policy of “let’s just roll back our shrinkage enforcement and let people steal more food”. I would instead support giving poorer people something like expanded EBT food stamp credits. (And, per kelsey, signing up for such a program should be made simpler and more rational!)
Tolerating a high amount of fare evasion also means letting some very disorderly people into the system, which makes the experience worse for all the actual paying users of the system. So it is not always “near zero marginal cost” to let other people free-ride (even outside of busy, congested times).
For an example targeting rich people instead of the poor: I think we should lower income and corporate taxes (which discourage productive work) and replace the lost revenue with pigovian taxes (ie a carbon tax, sin taxes, etc) plus georgist land taxes. And, for the purposes of this conversation, say that I support lowering taxes overall, in a way that would disproportionately benefit rich people and corporations. But I am totally against severely rolling back IRS enforcement against tax cheats (as is currently happening), because this runs into all kinds of adverse selection problems where you’re now disproportionately rewarding (probably less economically productive) cheats, while punishing (probably more productive) honest people.
I didn’t mean to imply the full ‘optimal amount of fraud is non-zero’ frame. I do mean an amount above that, and typed ‘non-zero’ hastily.
I wouldn’t support a policy of “let’s just roll back our shrinkage enforcement and let people steal more food”. I would instead support giving poorer people something like expanded EBT food stamp credits.
I support what gets the people fed. Supporting an option that isn’t really on the table (because serious proposals for doing welfare well aren’t being actively debated and implemented) doesn’t do anyone any good. I am not talking about a perfect world; I am looking at the world we are in and locating a tiny intervention/frame shift that might marginally improve things.
This paper shows that a firm can use the purchase price and the fine imposed on detected payment evaders to discriminate between unobservable consumer types.
In effect, payment evasion allows the firm to discriminate the prices of physically homogenous products: Regular consumers pay the regular price, whereas payment evaders face the expected fine. That is, payment evasion leads to a peculiar form of price discrimination where the regular price exceeds the expected fine (otherwise there would be no payment evasion).
Reading so many reviews/responses to IABIED, I wish more people had registered how they expected to feel about the book, or how they think a book on x-risk ought to look, prior to the book’s release.
Finalizing any Real Actual Object requires making tradeoffs. I think it’s pretty easy to critique the book on a level of abstraction that respects what it is Trying To Be in only the broadest possible terms, rather than acknowledging various sub-goals (e.g. providing an updated version of Nate + Eliezer’s now very old ‘canonical’ arguments), modulations of the broader goal (e.g. avoiding making strong claims about timelines, knowing this might hamstring the urgency of the message), and constraints (e.g. going through an accelerated version of the traditional publishing timeline, which means the text predates Anthropic’s Agentic Misalignment and, I’m sure, various other important recent findings).
A lot of the takes I see seem to come from a place of defending the ideal version of such a text by the lights of the reviewer, but it’s actually unclear to me whether many of these reviewers would have made the opposite critiques if the book had made the opposite call on the various tradeoffs. I don’t mean to say I think these reviewers are acting in bad faith; I just think it’s easy to avoid confronting how your ideal version couldn’t possibly be realized, and make post-hoc adjustments to that ideal thing in service of some (genuine, worthwhile) critique of the Real Thing.
Previously, it annoyed me that people had pre-judged the book’s contents. Now, I’m grateful to folks who wrote about it, or talked to me about it, before they read it (Buck Shlegeris, Nina Panickserry, a few others), because I can judge the consistency of their rubric myself, rather than just feeling:
Yes, this came up during drafting, but there was a reasonable tradeoff. We won’t know if that was a good call until later. If I had more energy I’d go 20 comments deep with you, and you’d probably agree it was a reasonable call by the end, but still think it was incorrect, and we’d agree to let time tell.
Which is the feeling that’s overtaken me as I read the various reviews from folks throughout the community.
I should say: I’m grateful for all the conversation, including the dissent, because it’s all data, but it is worse data than it would have been if you’d taken it upon yourself to cause a fuss in one of the many LW posts made in the lead-up to the book (and in the future I will be less rude to people who do this, because actually, it’s a kindness!).
Haven’t found the new ‘following’ page especially useful since the change (whereas previously it was my default). Not sure why that is exactly, but one thing I think would help is if it consolidated threads with multiple posts from followed users, instead of duplicating the same thread many times within the feed. If two or more people I follow have a back and forth, it eats all the other updates (and I usually read the whole back and forth by clicking a link from the first child I see, so then my feed is majority stuff I’ve already seen).
The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.
Land started a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.
If anyone knows of explicit connections between the CCRU and contemporary phenomena (beyond Land/Fisher’s immediate influence via their later work), I’d love to hear about them.
I agree that there is more of an influence than seems to be talked about. Besides the obvious influence on e/acc and of Land’s AGI opinions (the diagonality thesis, Pythia), the CCRU came up with hyperstitions (https://www.lesswrong.com/tag/hyperstitions), which seems to be a popular concept with the cyborgs.
Maybe they also contributed to the popularity of “Cthulhu” and “Shoggoth” as symbols in some AI circles.
Sometimes people give a short description of their work. Sometimes they give a long one.
I have an imaginary friend whose work I’m excited about. I recently overheard them introduce and motivate their work to a crowd of young safety researchers, and I took notes. Here’s my best reconstruction of what he’s up to:
“I work on median-case out-with-a-whimper scenarios and automation forecasting, with special attention to the possibility of mass-disempowerment due to wealth disparity and/or centralization of labor power. I identify existing legal and technological bottlenecks to this hypothetical automation wave, including a list of relevant laws in industries likely to be affected and a suite of evals designed to detect exactly which kinds of tasks are likely to be automated and when.
“My guess is that there are economically valuable AI systems between us and AGI/TAI/ASI, and that executing on safety and alignment plans in the midst of a rapid automation wave is dizzyingly challenging. Thinking through those waves in advance feels like a natural extension of placing any weight at all on the structure of the organization that happens to develop the first Real Scary AI. If we think that the organizational structure and local incentives of a scaling lab matter, shouldn’t we also think that the societal conditions and broader class of incentives matter? Might they matter more? The state of the world just before The Thing comes on line, or as The Team that makes The Thing is considering making The Thing, has consequences for the nature of the socio-technical solutions that work in context.
“At minimum, my work aims to buy us some time and orienting-power as the stakes raise. I’d imagine my maximal impact is something like “develop automation timelines and rollout plans that you can peg AI development to, such that the state of the world and the state of the art AI technology advance in step, minimizing the collateral damage and chaos of any great economic shift.”
“When I’ve brought these concerns up to folks at labs, they’ve said that these matters get discussed internally, and that there’s at least some agreement that my direction is important, but that they can’t possibly be expected to do everything to make the world ready for their tech. I, perhaps somewhat cynically, think they’re doing narrow work here on the most economically valuable parts, but that they’re disinterested in broader coordination with public and private entities, since it would be economically disadvantageous to them.
“When I’ve brought these concerns up to folks in policy, they’ve said that some work like this is happening, but that it’s typically done in secret, to avoid amorphous negative externalities. Indeed, the more important this work is, the less likely someone is to publish it. There’s some concern that a robust and publicly available framework of this type could become a roadmap for scaling labs that helps them focus their efforts for near-term investor returns, possibly creating more fluid investment feedback loops and lowering the odds that disappointed investors back out, indirectly accelerating progress.
“Publicly available work on the topic is ~abysmal, painting the best case scenario as the most economically explosive one (most work of this type is written for investors and other powerful people), rather than pricing in the heightened x-risk embedded in this kind of destabilization. There’s actually an IMF paper modeling automation from AI systems using math from the industrial revolution. Surely there’s a better way here, and I hope to find it.”
I think it’s somewhat productive direction explored in these 3 posts, but it’s not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that.
Thinking about qualia, trying to avoid getting trapped in the hard problem of consciousness along the way.
Tempted to model qualia as a region with the capacity to populate itself with coarse heuristics for difficult-to-compute features of nodes in a search process, which happens to ship with a bunch of computational inconveniences (that are most of what we mean to refer to when we reference qualia).
This aids in generality, but trades off against locally optimal processes, as a kind of ‘tax’ on all cognition.
This is a literal shower thought and I’ve read nothing on this anywhere; most of the literature appears to be trapped in parts of the hard problem that I don’t find compelling, and which don’t seem necessary to grapple with for this angle of inquiry (I’m thinking about something closer to the ‘easy problems’).
Any reading recommendations on this or related topics?
Following up to say that the thing that maps most closely to what I was thinking about (or satisfied my curiosity) is GWT.
GWT is usually intended to approach the hard problem, but the principle critique of it is that it isn’t doing that at all (I ~agree). Unfortunately, I had dozens of frustrating conversations with people telling me ‘don’t spend any time thinking about consciousness; it’s a dead end; you’re talking about the hard problem; that triggers me; STOP’ before someone actually pointed me in the right direction here, or seemed open to the question at all.
Is qualia (it’s existence or not, how and why it happens) not the exact thing the hard problem is about? If you’re ignoring the hard problem or dismiss it you also doubt the existence of qualia.
I guess I should have said ‘without getting caught in the nearby attractors associated with most conversations about the hard problem of consciousness’. There’s obviously a lot there, and my guess is >95 percent of it wouldn’t feel to me like it has little meaningful surface area with what I’m curious about.
[errant thought pointing a direction, low-confidence musing, likely retreading old ground]
There’s a disagreement that crops up in conversations about changing people’s minds. Sides are roughly:
You should explain things by walking someone through your entire thought process, as it actually unfolded. Changing minds is best done by offering an account of how your own mind was changed.
You should explain things by back-chaining the most viable (valid) argument, from your conclusions, with respect to your specific audience.
This first strategy invites framing your argument around the question “How did I come to change my mind?”, and this second invites framing your argument around the question “How might I change my audience’s mind?”. I am sometimes characterized as advocating for approach 2, and have never actually taken that to be my position.I think there’s a third approach here, which will look to advocates of approach 1 as if it were approach 2, and look to advocates of approach 2 as if it were approach 1. That is, you should frame the strategy around the question “How might my audience come to change their mind?”, and then not even try to change it yourself.
This third strategy is about giving people handles and mechanisms that empower them to update based on evidence they will encounter in the natural course of their lives, rather than trying to do all of the work upfront. Don’t frame your own position as some competing argument in the market place of ideas; hand your interlocutor a tool, tell them what they might expect, and let their experience confirm your predictions.I think this approach has a few major differences over the other two approaches, from the perspective of its impact:
It requires much less authority. (Strength!)
It can be executed in a targeted, light-weight fashion. (Strength!)
It’s less likely to slip into deception than option 2, and less confrontational than option 1. (Strength!)
Even if it works, they won’t end up thinking exactly what you think (Weakness?)
….but they’ll be better equipped to make sense of new evidence (Strength!)
Plausibly more mimetically fit than option 1 or 2 (a failure mode of 1 is that your interlocutor won’t be empowered to stand up to criticism while spreading the ideas, even to people very much like themselves, and for option 2 it’s that they will ONLY be successful in spreading the idea to people who are like themselves, since they only know the argument that works on them).
I think Eliezer has talked about some version of this in the past, and this is part of why people like predictions in general, but I think pasting a prediction at the end of an argument built around strategy 1 or 2 isn’t actually Doing The Thing I mean here.
Friends report Logan’s writing strongly has this property.
Do you think of rationality as a similar sort of ‘object’ or ‘discipline’ to philosophy? If not, what kind of object do you think of it as being?
(I am no great advocate for academic philosophy; I left that shit way behind ~a decade ago after going quite a ways down the path. I just want to better understand whether folks consider Rationality as a replacement for philosophy, a replacement for some of philosophy, a subset of philosophical commitments, a series of cognitive practices, or something else entirely. I can model it, internally, as aiming to be any of these things, without other parts of my understanding changing very much, but they all have ‘gaps’, where there are things that I associate with Rationality that don’t actually naturally fall out of the core concepts as construed as any of these types of category [I suppose this is the ‘being a subculture’ x-factor]).
Sometimes people express concern that AIs may replace them in the workplace. This is (mostly) silly. Not that it won’t happen, but you’ve gotta break some eggs to make an industrial revolution. This is just ‘how economies work’ (whether or not they can / should work this way is a different question altogether).
The intrinsic fear of joblessness-resulting-from-automation is tantamount to worrying that curing infectious diseases would put gravediggers out of business.
There is a special case here, though: double digit unemployment (and youth unemployment, in particular) is a major destabilizing force in politics. You definitely don’t want an automation wave so rapid that the jobless and nihilistic youth mount a civil war, sharply curtailing your ability to govern the dangerous technologies which took everyone’s jobs in the first place.
As AI systems become more expensive, and more powerful, and pressure to deploy them profitably increases, I’m fairly concerned that we’ll see a massive hollowing out of many white collar professions, resulting in substantial civil unrest, violence, chaos. I’m not confident that we’ll get (i.e.) a UBI (or that it would meaningfully change the situation even if we did), and I’m not confident that there’s enough inertia in existing economic structures to soften the blow.
The IMF estimates that current tech (~GPT 4 at launch) can automate ~30% of human labor performed in the US. That’s a big, scary number. About half of these, they imagine, are the kinds of things you always want more of anyway, and that this complementarity will just drive production in that 15% of cases. The other 15%, though, probably just stop existing as jobs altogether (for various reasons, I think a 9:1 replacement rate is more likely than full automation, with current tech).
This mostly isn’t happening yet because you need an ML engineer to commit Serious Time to automating away That Job In Particular. ML engineers are expensive, and usually not specced for the kind of client-facing work that this would require (i.e. breaking down tasks that are part of a job, knowing what parts can be automated, and via what mechanisms, be that purpose-built models, fine-tuning, a prompt library for a human operator, some specialized scaffolding...). There’s just a lot of friction and lay-speak necessary to accomplish this, and it’s probably not economically worth it for some subset of necessary parties (ML engineers can make more elsewhere than small business owners can afford to pay them to automate things away, for instance).
So we’ve got a bottleneck, and on the other side of it, this speculative 15% leap in unemployment. That 15% potential leap, though, is climbing as capabilities increase (this is tautologically true; “drop in replacement for a remote worker” is one major criteria used in discussions about AI progress).
I don’t expect 15% unemployment to destabilize the government (Great Depression peak was 25%, which is a decent lower bound on ‘potentially dangerous’ levels of unemployment in the US). But I do expect that 15% powder keg to grow in size, and potentially cross into dangerous territory before it’s lit.
Previously, I’d actually arrived at that 30% number myself (almost exactly one year ago), but I had initially expected:
Labs would devote substantial resources to this automation, and it would happen more quickly than it has so far.
All of these jobs were just on the chopping block (frankly, I’m not sure how much I buy the complementarity argument, but I am An Internet Crank, and they are the International Monetary Fund, so I’ll defer to them).
These two beliefs made the situation look much more dire than I now believe it to be, but it’s still, I claim, worth entertaining as A Way This Whole Thing Could Go, especially if we’re hitting a capabilities plateau, and especially if we’re doubling down on government intervention as our key lever in obviating x-risk.
[I’m not advocating for a centrally planned automation schema, to be clear; I think these things have basically never worked, but would like to hear counterexamples. Maybe just like… a tax on automation to help staunch the flow of resources into the labs and their surrogates, a restructuring of unemployment benefits and retraining programs and, before any of that, a more robust effort to model the economic consequences of current and future systems than the IMF report that just duplicates the findings of some idiot (me) frantically reviewing BLS statistics in the winter of 2023.]
I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don’t ‘think differently’, in a structural sense, from their forebears; they just don’t believe in that God anymore.
The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.
They’re no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can’t really think about those levels).
You should still be nice to them, and honest with them, but you should understand what you’re getting into.
The mere biographical detail of having a religious background or being religious isn’t a strong mark against someone’s thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don’t be so surprised when the usual jutsu falls flat!
I don’t think I really understood what it meant for establishment politics to be divisive until this past election.
As good as it feels to sit on the left and say “they want you to hate immigrants” or “they want you to hate queer people”, it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).
But there’s also a deeper, structural sense in which it’s true.
Working on AIS, I’ve long hoped that we could form a coalition with all of the other people worried about AI, because a good deal of them just.. share (some version of) our concerns, and our most ambitious policy solutions (e.g. stopping development, mandating more robust interpretability and evals) could also solve a bunch of problems highlighted by the FATE community, the automation-concerned, etc etc.
Their positions also have the benefit of conforming to widely-held anxieties (‘I am worried AI will just be another tool of empire’, ‘I am worried I will lose my job for banal normie reasons that have nothing to do with civilizational robustness’, ‘I am worried AI’s will cheaply replace human labor and do a worse job, enshittifying everything in the developed world’). We could generally curry popular support and favor, without being dishonest, by looking at the Venn diagram of things we want and things they want (which would also help keep AI policy from sliding into partisanship, if such a thing is still possible, given the largely right-leaning associations of the AIS community*).
For the next four years, at the very least, I am forced to lay this hope aside. That the EO contained language in service of the FATE community was, in hindsight, very bad, and probably foreseeably so, given that even moderate Republicans like to score easy points on culture war bullshit. Probably it will be revoked, because language about bias made it an easy thing for Vance to call “far left”.
“This is ok because it will just be replaced.”
Given the current state of the game board, I don’t want to be losing any turns. We’ve already lost too many turns; setbacks are unacceptable.
“What if it gets replaced by something better?”
I envy your optimism. I’m also concerned about the same dynamic playing out in reverse; what if the new EO (or piece of legislation via whatever mechanism), like the old EO, contains some language that is (to us) beside the point, but nonetheless signals partisanship, and is retributively revoked or repealed by the next administration? This is why you don’t want AIS to be partisan; partisanship is dialectics without teleology.
Ok, so structurally divisive: establishment politics has made it ~impossible to form meaningful coalitions around issues other than absolute lightning rods (e.g. abortion, immigration; the ‘levers’ available to partisan hacks looking to gin up donations). It’s not just that they make you hate your neighbors, it’s that they make you behave as though you hate your neighbors, lest your policy proposals get painted with the broad red brush and summarily dismissed.
I think this is the kind of observation that leads many experienced people interested in AIS to work on things outside of AIS, but with an eye toward implications for AI (e.g. Critch, A Ray). You just have these lucid flashes of how stacked the deck really is, and set about digging the channel that is, compared to the existing channels, marginally more robust to reactionary dynamics (‘aligning the current of history with your aims’ is maybe a good image).
Hopefully undemocratic regulatory processes serve their function as a backdoor for the sensible, but it’s unclear how penetrating the partisanship will be over the next four years (and, of course, those at the top are promising that it will be Very Penetrating).
*I am somewhat ambivalent about how right-leaning AIS really is. Right-leaning compared to middle class Americans living in major metros? Probably. Tolerant of people with pretty far-right views? Sure, to a point. Right of the American center as defined in electoral politics (e.g. ‘Republican-voting’)? Usually not.
Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?
Many MATS scholars go to Anthropic (source: I work there).
Redwood I’m really not sure, but that could be right.
Sam now works at Anthropic.
Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly the most transparent lab re misuse risks.
I take the thrust of your comment to be “OP funds safety, do your research”. I work in safety; I know they fund safety.
I also know most safety projects differentially benefit Anthropic (this fact is independent of whether you think differentially benefiting Anthropic is good or bad).
If you can make a stronger case for any of the other of the dozens of orgs on your list than exists for the few above, I’d love to hear it. I’ve thought about most of them and don’t see it, hence why I asked the question.
Further: the goalpost is not ‘net positive with respect to TAI x-risk.’ It is ‘not plausibly a component of a meta-strategy targeting the development of TAI at Anthropic before other labs.’
Edit: use of the soldier mindset flag above is pretty uncharitable here; I am asking for counter-examples to a hypothesis I’m entertaining. This is the actual opposite of soldier mindset.
Sam Kriss (author of the ‘Laurentius Clung’ piece) has posted a critique. I don’t think it’s good, but I do think it’s representative of a view that I ever encounter in the wild but haven’t really seen written up.
What view? This seemed largely substanceless, often arbitrary/wrong, and a waste of my time. It had no overarching view except to state his own negative interpretation of various bits of the rationalist scene as fact. (Downvoted.)
Example of substanceless:
Rationalism is the notion that the universe is a collection of true facts… but my own perspective is different. I think the universe is not a collection of true facts; I think a good forty to fifty percent of it consists of lies, myths, ambiguities, ghosts, and chasms of meaning that are not ours to plumb.
Example of wrong:
The main character of HPMOR is called Harry Potter, but—and the author has been very open about this—in fact he’s a stand-in for Eliezer Yudkowsky.
Fine by me that this doesn’t meet your bar for what counts as a view. It is, however, very literally a ‘way he sees you and your friends’, and being seen a way has consequences, no matter how incoherent or inaccurate the way you’re being seen is.
FWIW I don’t agree that being seen a way reliably has any consequences that matter. I should probably write a post sometime on my view about the importance of not respecting or paying much attention to people who are paying attention to you; this point of disagreement has come up a lot.
Bernie Sanders has released a video on x-risk featuring a discussion with Eliezer, Nate, Daniel Kokotajlo, and Jeffrey Ladish. An excerpt from it appears to be blowing up on twitter.
EDIT: re ‘blowing up’: initial view velocity was really high (~200k/hour), but steeply dropped off and it doesn’t seem to have really broken containment
some additional recent AI x-risk things by Bernie Sanders:
a presentation on AI on the Senate floor
a press conference introducing the AI Data Center Moratorium Act (with Alexandria Ocasio-Cortez)
the text of the act
...it has 1.2 M views. Surely it’s “blown up” at this point.
Depends on the reference class. As of the past year or so, 1m eyes on a piece of AI safety content isn’t crazy, especially for a video on Twitter, where my impression is the criteria for what counts as a ‘view’ are pretty liberal. Like, plausibly the video has been viewed in full (much) less than half that many times.
Separately, videos posted by that account seem to routinely get ~1m views—not outperforming other content from an external collaborator is a little disappointing from a raw metrics perspective! Naively you’d hope to get the combined weight of your respective audiences, which seems to have only somewhat happened here.
When I posted this I think I expected we’d get to 2m views in the first couple days (weakly outperforming other Sanders Twitter content). I think with a different video, that could have happened.
Still an exciting crossover episode.
Huh? Eliezer’s main account tens to get a few ten thousands of views. So under the combination of respective audiences theory, it’s… mostly the audience of Bernie Sanders! Surely that breaks containment.
Also, looking at Bernie’s account suggests that it’s in the higher view count class of his posts—some get 2M but others get 100K and if you go by eyeballed average it seems to be 500k.
I was comparing to other video posts by Sanders.
I was comparing to a broader activation of Eliezer’s audience vs any given tweet.
Outperforming the ‘average’ is the wrong standard for ‘blowing up’. ‘Blowing up’ would be ‘outperforming all the recent similar artifacts’, at least as I intended it in my original post.
Meta: it feels pretty strange to have used an underspecified colloquial term, to have walked back the applicability of that term as I intended to use it, and then to be told I’m wrong for walking it back. The point I cared about capturing in that edit is ‘this tweet didn’t do as well as I expected when it first dropped.’ That’s a claim about my own expectations.
The point I care about is whether it’s gotten to a substantial audience of ‘normal’ people. I had read you as making claims that that hadn’t really happened; yet it looks to me that, to the extent that twitter’s viewcount is reasonable (though I believe you that it probably isn’t), a million people who are mostly Bernie Sanders fans got exposed to the video. If you only meant that this is less than what you expected, then that’s my bad and we don’t have any disagreement.
New reacts available only to paid users of LessWrong Premium (not you freeloaders) facilitate frictionless, borderline-telepathic communication.
‘I will NEVER change my mind’: Use this react to assert that you’re content with exactly how wrong you are (which is not at all), and that the case is permanently closed on this matter, so far as you’re concerned.1
‘EY Stamp of Approval’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky agrees with the contents of the comment, rendering it beyond reproach.
‘NOT EY Approved’: Use this react to assert that, on your personal authority, Eliezer Yudkowsky disagrees with the contents of the comment, rendering it immensely reproachful. Users who accrue too many ‘NOT EY Approved’ reacts will have their accounts suspended (although actual thresholds here have yet to be set).2
‘May as well be AI’: Use this react when you’re indifferent to whether or not a statement was generated by AI because shit, it may as well be. You’re ignoring it either way.
‘Have you even read plane crash?’: Use this react when your interlocutor’s unfamiliarity with prior literature is clearly on display.
‘China Hawk’: Use this react to assert American supremacy and insist that its recipient is derelict in their duty to ensure the preeminence of the greatest country in the history of conscious life from here to the other side of the singularity.
’Toilet’3
‘Sure, buddy’: We all know that the optimal amount of mental masturbation is non-zero. But some reach far beyond the zone of optimality and into the depths of their pants to produce truly monstrous works of self-gratification. Previously, one was compelled to express such an opinion obliquely on LessWrong, or else shatter the decorum of the space and open oneself up to similar critiques. In Beta, we now have the power to quietly acknowledge the reality of the situation, without derailing the gratification itself.
———————————————————————————————
1This has replaced the ‘I beseech thee’ emoji, which never worked anyway.
2 Note that both EY-invoking reacts are invisible to anyone logged into the @-EliezerYudkowsky LessWrong account, or from any IP address that has ever been logged into that account. No point in having the man himself weigh in when so many LessWrong users are so well practiced at speaking on his behalf!
3Beta users haven’t settled on how this react ought to be deployed. I’ve seen utilizations ranging from ‘This post belongs in the toilet’ to ‘I enjoyed reading this on the toilet.’
Disappointed that they left out these—I really don’t like choosing between being technically correct and politically correct. Maybe they can go in a LessWrong Max tier.
I will not do this and nobody else should either, but I think in a world where the negative effects were magically not present, it would be quite funny to react to all EY comments with “not EY approved”
Edit: also no way to pay lw has been forthcoming, making the thing I’m replying to effectively only an April fools lie
Perhaps both. The composer and musician Max Reger once responded to a disagreeable review thus:
Ich sitze im kleinsten Raum des Hauses. Ich habe Ihre Kritik vor mir. Bald werde ich sie hinter mir haben.
Pete Hegseth just declared Anthropic a supply chain risk.
Under the most expansive naive interpretation of what this might mean, it seems plausible that this bottlenecks their compute access (e.g., through Amazon, who has government contracts).
Does anyone know more here / has thought more about this? I’d be pretty surprised if the most expansive naive interpretation held, but it’s also probably not the most narrow one (e.g., can’t work with the defense industry, or on government-specific contracts in larger companies).
That so many companies (the majority of Fortune 500s, for instance) contract with the government seems to make this dramatically impactful (e.g. some-double-digit-percentage of B2B revenue; but is it 10 percent or 90? How much does it impact other aspects of their operation? I just don’t know).
Edit: An important detail here, not obvious from the initial post, is that ‘Hegseth makes a tweet’ is kind of a ‘step 0’ to the actual process unfolding. We actually don’t know if he’s really placed the order, if the order will be followed through on, and with what level of diligence/brutality (it’s this last question that my original post was asking, but in some importance sense ‘we’re just not there yet’).
Not to be overly political (although how do you avoid it with a topic like this?) but I think it’s just another part of “MAGA maoism” as it’s been called. The idea of the Republican party as a small government free market pro liberty group died with Trump, who painted over that whole philosophy with his focus on culture war grievances, state control, and “owning the libs”.
Matt Yglesias had a good piece two years ago about how the tariff regime essentially makes a mini central planning department and we’ve seen Trump expressly use it as such, trying to bend the economy to attack companies and nations over personal grievances.
And Scott Lincicome of the CATO institute has covered this new brand of “state corporatism” as he calls it quite well IMO, about the nationalization of various companies and the various market distortions it creates.
“First they came” is a famous poem about the oppression of people, but it’s also a good general warning for all sorts of behavior. What someone will happily do to another is what they can happily do to you if you ever cross them. Whether that be the chick who cheated on her husband for you cheating on your new relationship, stuffing people into camps, or using the full force of government to smash your company into bits if you aren’t compliant.
There’s likely to be more of this behavior coming in the future. They’re trying to unlock themselves a new power, and what hammer doesn’t seek out a nail?
The versions now in use go something like
and
Neither of these are really in keeping with the spirit of the original poem you reference, but insofar as there is such a thing as the Moral Law, I’d say the second is much more in keeping with it. As the Bible says, Vengeance is Divine.
I work for $BIGTECH$ and they’ve spent quite a lot of effort and money (multiples of tens of millions by my own back of the napkin estimation) on getting everyone, and I mean everyone, onto Claude Code. I rack up a $100 bill daily and I know guys who’ll do 10x that. We’re in the narrow sense a government/defense contractor, and I would imagine the guys at the top are going to be extremely angry when this whole initiative they’ve spent so much on gets stonewalled by a decision that could be generously construed as “arbitrary” or “short-sighted” as a whole. If the expansive interpretation holds, I would find myself surprised if Anthropic is alive in any meaningful capacity by Christmas, but I wouldn’t rule out a purposefully narrowed interpretation—there’s quite a lot of institutional money backing Anthropic, 4.6-opus is the current state of the art based solely on METR time horizons (admittedly no data yet for Gemini 3.1-pro) and stripping large tech companies of access to cutting edge tools would quickly make enemies of a large amount of very influential and wealthy industrialists—maybe not the wisest move politically.
As I understand it, the “supply chain risk” only restricts the target in one direction. Amazon can still sell to Anthropic and keep its government contracts.
I expect this is true in the simple case of ‘you can sell a service to them’; the reason I used the Amazon example, is that Amazon owns shares in Anthropic ($60B worth according to quick Claude check), which it purchased (at least in part) with compute credits; this is a much more intimate entanglement then Anthropic being yet-another-AWS-customer.
Again, I genuinely don’t know how this pans out, but the crux for me is not ‘can you sell a product to a company that’s a supply chain risk and keep your contracts’; it’s ‘can you [do all the things Amazon is doing with Anthropic, which are largely mutually conditioned on one another as part of complex agreements] with a supply chain risk and keep your contracts.’
MIRI is potentially interested in supporting reading groups for If Anyone Builds It, Everyone Dies by offering study questions, facilitation, and / or copies of the book, at our discretion. If you lead a pre-existing reading group of some kind (or meetup group that occasionally reads things together), please fill out this form.
The deadline to submit is September 22, but sooner is much better.
In two days: MIRI’s hosting a virtual event for those who preorder Nate and Eliezer’s forthcoming book If Anyone Builds It, Everyone Dies. Sunday, August 10 at noon PT. It’s a chat between Nate Soares and Tim Urban (Wait But Why) followed by a Q&A.
This will be followed by another event in September, a full Q+A w/ Nate + Eliezer.
ifanyonebuildsit.com/events
Will there be any recording?
We will likely share a recording with those who preorder and submit the form; depends on how the talk itself goes, and a few other variables.
I wasn’t able to make it to the q&a. Did it manage to get recorded in a shareable way?
Meanings of political identities shift dramatically based on context, and you can’t manually confirm the beliefs of everyone present at your ‘gathering of people with x political identity’. To the extent that your political identity is based on Real Beliefs with Real Consequences, you should expect not to have much in common with many other people who declare the same identity when you move to a new place (or corner of the internet).
Example: In rural Southeast Texas, Confederate flags are a common sight, and my geometry teacher once told us about a cross burning he witnessed (which a few students murmured we really ought to bring back).
The majority of people genuinely hold at least one belief that, to many of my coastal-elite-descended friends, would seem comical. E.g., women should never have jobs and should rarely speak (especially in public), men with long hair are wanton or gay or trans or both, beating children (not like ‘spanking’ but like ‘anything short of broken bones’) is not only fine but your duty as a father, weed overdose not only can but will definitely kill you, megadoses of zinc can cure cancer, the covid vaccine is the mark of the beast from the book of revelation, high school football ought to be the most important thing in your life and, if it isn’t, you are not just odd but untrustworthy, and abortion doctors force-feed fetuses to geese to make fois gras for gay New Yorkers (of which the force feeding is the only ethical component).
Okay, I made up the last one, but the rest are actual positions I’ve heard espoused hundreds or thousands of times by people I met between the ages of 14 and 18.
Also many people talk like this, and everyone’s a ‘libertarian’.
My mom’s from a conservative California family with environmentalist sympathies, and we had something like 60 percent overlap in our views prior to the Texas move. However, I soon found that everyone around who wasn’t liable to drop one of these devastating truth bombs on me thought of themselves as somewhere to the left of Bernie Sanders and read 20th century Marxist writings in their free time. Often these people would think leftist voices at the national scale were somewhat silly or focused on the wrong things (e.g. identity politics), but they nonetheless considered themselves closer to those views than to the other views present in their environs. (There was a democratic party around, but it was very different from the national democratic party for reasons I won’t address here.)
I assume most readers are in an environment more similar to my current environment (Berkeley, CA) than to Lumberton, TX, and so won’t explicate the delta.
I think there’s a lot of mind-killing that happens as a result of relying on a presumed shared vocabulary for political identities that does not exist. When I say something left-coded, my Rationalist Libertarian Interlocutor often reproaches me, and then as we talk more about it, they often conclude that I’m a ‘boring centrist like everybody else’ who uses the language of the left owing to some biographical quirk.
I submit instead that everyone’s sense of the political map is hopelessly warped due to biographical quirks, and that assuming an adversarial posture on the basis of someone’s declared political identity is often, and maybe even ~always, a mistake.
I submit that the problem you are encountering is operating too much on simulacra level 3.
I honestly don’t give a rats ass what your “political identity” is. I care about your beliefs about the effects of different policies, your current beliefs about the world, and most of all your reasons for believing those things. Insofar as your “reasons” are in fact you self-modeling as “leftist”, “centrist”, “libertarian”, or some other label like that, you are clearly operating on simulacra level 3, and I stop caring.
That is to say that insofar as the majority of people have one or more of the items on that list of absurd beliefs you gave because they genuinely believe such things, they are dumb, and the only reason I’d care what they have to say is some morbid curiosity, or of course to argue against them (or both).
Insofar as the majority of people have one or more of the items on that list of absurd beliefs you gave because we simply lack a shared vocabulary for political identities, I again don’t care what they have to say. If they are actually trying to communicate to me that they think of themselves as a centrist, well, that claim has no impact on reality whatsoever, so who cares?
The same goes for people with stupid claims which read leftist.
The thing is that we do have shared vocabulary for things in the real world (excepting many technical terms, but one can just use Wikipedia or Claude for those), and there are not active social wars over what those real world words should mean (that laypeople ought to care about).
If you stick to simply precisely stating your beliefs and values (and reasons for having them), not only will you avoid the upsetting result of having others believe you are The Dreadded Outgoup!!! but you will also avoid meaningless conversations about which
Hogwarts housepolitical identity you are.You wrote this comment in an adversarial tone but I Just Agree With You.
Indeed, this is an alternate formulation of the thesis of my post, and even uses language I used when characterizing the post itself to someone in the office ~2 hours ago.
Hm, I think the thing I disagree with, and was trying to argue against was that it is a good idea in the first place to even try to have a political map at all, nevermind to try to unwarp it.
The post is meant to be somewhat agnostic on the question—conditional on one has a map, here’s a common failure mode. It’s also meant to point in the direction of ‘reconsider the value of your map’.
Separately, I think I ~endorse your first comment, but I also think there are cases in which you should definitely have a map (eg you are attempting to achieve political ends). So I think your second comment is somewhat overstated.
Thanks so much for this comment, it crystallized something that’s been in my subconscious for like two years.
The rest of the comment made sense but this feels sorta non-sequitor-ial. Maybe this is right, but, most of the things you said seemed like on average it would increase the amount I expected some kind of adversarial posture to make sense. (Except insofar as adversarial postures are basically not a good strategy, in which case it still doesn’t depend much on the rest of the contents of the quick take)
I don’t understand this. Can you say more?
Just said to someone that I would by default read anything they wrote to me in a positive light, and that if they wanted to be mean to me in text, they should put ‘(mean)’ after the statement.
Then realized that, if I had to put ‘(mean)’ after everything I wrote on the internet that I wanted to read as slightly abrupt or confrontational, I would definitely be abrupt and confrontational on the internet less.
I am somewhat more confrontational than I endorse, and having to actually say to myself and the world that I was intending to be harsh, rather than simply demonstrating it, would lead to me being harsh somewhat less often.
I do basically know when I’m being confrontational in public, and am Doing It On Purpose, and also expect others to know it, and almost never say “I wasn’t being confrontational” when I was being confrontational, and somehow having to signal it more overtly than it is already (that is to say, very) signaled would encourage me to tone it down.
In person, I even sometimes say ‘I am being difficult in a way that I endorse/feels correct/feels important’ and just go on being difficult, and considering that I’m about to do that in advance doesn’t discourage me from being difficult.
Might start running the internal simulation ‘you have to put a ‘(mean)’ tag on something when you’re intending to be a little mean on the internet’ to see how it changes things.
(I am not going to defend the position ‘sometimes it is correct to be a little mean’; I’m more interested in the other phenomenon.)
What a mean thing to say. (If you commit to perceive positivity regardless of its presence, you thereby commit to ignore any real positivity the other person intended to convey.)
Putting ‘(mean)’ at the end of a statement is the exact mechanism of saying ‘no’ that is (jokingly!) offered in this scenario, to (again, jokingly) address the exact thing you are describing, with the exact post you have linked as the referent which appeared in my mind as I made the joke! The joke being ‘that is obviously not a sufficient mechanism to solve the problem of offering a real valence, thereby robbing my interlocutor of some agency/expressive power.’ You are making the same point I am making as though you are correcting me.
You are repeating my own joke back to me as if you originated it, and ignoring the substantive reflective point that I thought might be of value (which does not itself importantly hinge on the original Garrabrant-adjacent context).
I guess you didn’t intend lack of valence or unintended negativity as relevant distinct possibilities, so that lack of the “(mean)” tag would be synonymous with “(positive)” in the intended reading. Without this assumption, you are leaving the other person without a method for communicating positivity, the protocol only supports expressing either negativity or an undifferentiated mixture of positivity and lack of valence, preventing communication of the distinction between positivity and lack of valence.
So the workaround more directly makes the point about still needing to communicate negativity (or its lack), not positivity, and I think the latter is the more curious part of the implication. For a statement of committing to see certain things in a positive light, this implication of its literal meaning conveys the opposite of the way this kind of sentiment is usually intended.
There’s no sharp line between Doing It On Purpose and not.
Being explicit about it makes it more on purpose, and engages more System 2 careful sequential thought, that clarifies whether you want to do this or not.
I think the phenomenon you’re describing is downstream of this common error in thinking about people.
I think it’s a very impactful error, both in reasoning about ourselves and others.
Yes exactly; I’m curious about how many more opportunities for greater intentionality/reflective-endorsement might be lurking in other areas, where I just haven’t created the right test/handle (but may believe that I’ve created the right one).
I’m also mindful of the opposite failure mode, though, where attempting to surface something to yourself internally actually causes you to over-index on it, leading to paralysis, where the thing was only present in very small doses and your threshold was poorly calibrated.
Please stop appealing to compute overhang. In a world where AI progress has wildly accelerated chip manufacture, this already-tenuous argument has become ~indefensible.
I tried to make a similar argument here, and I’m not sure it landed. I think the argument has since demonstrated even more predictive validity with e.g. the various attempts to build and restart nuclear power plants, directly motivated by nearby datacenter buildouts, on top of the obvious effects on chip production.
I’ve just read this post and the comments. Thank you for writing that; some elements of the decomposition feel really good, and I don’t know that they’ve been done elsewhere.
I think discourse around this is somewhat confused, because you actually have to do some calculation on the margin, and need a concrete proposal to do that with any confidence.
The straw-Pause rhetoric is something like “Just stop until safety catches up!” The overhang argument is usually deployed (as it is in those comments) to the effect of ‘there is no stopping.’ And yeah, in this calculation, there are in fact marginal negative externalities to the implementation of some subset of actions one might call a pause. The straw-Pause advocate really doesn’t want to look at that, because it’s messy to entertain counter-evidence to your position, especially if you don’t have a concrete enough proposal on the table to assign weights in the right places.
Because it’s so successful against straw-Pausers, the anti-pause people bring in the overhang argument like an absolute knockdown, when it’s actually just a footnote to double check the numbers and make sure your pause proposal avoids slipping into some arcane failure mode that ‘arms’ overhang scenarios. That it’s received as a knockdown is reinforced by the gearsiness of actually having numbers (and most of these conversations about pauses are happening in the abstract, in the absence of, i.e., draft policy).
But… just because your interlocutor doesn’t have the numbers at hand, doesn’t mean you can’t have a real conversation about the situations in which compute overhang takes on sufficient weight to upend the viability of a given pause proposal.
You said all of this much more elegantly here:
...which feels to me like the most important part. The burden is on folks introducing an argument from overhang risk to prove its relevance within a specific conversation, rather than just introducing the adversely-gearsy concept to justify safety-coded accelerationism and/or profiteering. Everyone’s prior should be against actions Waluigi-ing, by default (while remaining alert to the possibility!).
To whom are are you talking?
Folks using compute overhang to 4D chess their way into supporting actions that differentially benefit capabilities.
I’m often tempted to comment this in various threads, but it feels like a rabbit hole, it’s not an easy one to convince someone of (because it’s an argument they’ve accepted for years), and I’ve had relatively little success talking about this with people in person (there’s some change I should make in how I’m talking about it, I think).
More broadly, I’ve started using quick takes to catalog random thoughts, because sometimes when I’m meeting someone for the first time, they have heard of me, and are mistaken about my beliefs, but would like to argue against their straw version. Having a public record I can point to of things I’ve thought feels useful for combatting this.
While I’m not a general fan of compute overhang, I do think that it’s at least somewhat relevant in worlds where AI pauses are very close to when a system is able to automate at least the entire AI R&D process, if not the entire AI economy itself, and I do suspect realistic pauses imposed by governments will likely only come once a massive amount of people lose their jobs, which can create incentives to go to algorithmic progress, and even small algorithmic progress might immediately blow up the pause agreement crafted in the aftermath of many people losing their jobs.
I think it would be very helpful to me if you broke that sentence up a bit more. I took a stab at it but didn’t get very far.
Sorry for my failure to parse!
Basically, my statement in short terms is that conditional on AI pause happening because of massive job losses from AI that is barely unable to take-over the world, then even small saving in compute via better algorithms due to algorithmic research not being banned would incentivize more algorithmic research, which then lowers the compute enough to make the AI pause untenable and the AI takes over the world.
So for this argument to be worth bringing up in some general context where a pause is discussed, the person arguing it should probably believe:
We are far and away most likely to get a pause only as a response to unemployment.
An AI that precipitates pause-inducing levels of unemployment is inches from automating AI R+D.
The period between implementing the pause and massive algorithmic advancements is long enough that we’re able to increase compute stock...
....but short enough that we’re not able to make meaningful safety progress before algorithmic advancements make the pause ineffective (because, i.e., we regulated FLOPS and it just now takes 100x fewer FLOPS to build the dangerous thing).
I think the conjunct probability of all these things is low, and I think their likelihood is sensitive to the terms of the pause agreement itself. I agree that the design of a pause should consider a broad range of possibilities, and try to maximize its own odds of attaining its ends (Keep Everyone Alive).
I’m also not sure how this goes better in the no-pause world? Unless this person also has really high odds on multipolar going well and expects some Savior AI trained and aligned in the same length of time as the effective window of the theoretical pause to intervene? But that’s a rare position among people who care about safety ~at all; it’s kind of a George Hotz take or something...
(I don’t think we disagree; you did flag that this as ”...somewhat relevant in worlds where...” which is often code for “I really don’t expect this to happen, but Someone Somewhere should hold this possibility in mind.” Just want to make sure I’m actually following!)
I think 1 and 2 are actually pretty likely, but 3 and 4 is where I’m a lot less confident in actually happening.
A big reason for this is that I suspect one of the reasons people aren’t reacting to AI progress is they assume it won’t take their job, so it will likely require massive job losses for humans to make a lot of people care about AI, and depending on how concentrated AI R&D is, there’s a real possibility that AI has fully automated AI R&D before massive job losses begin in a way that matters to regular people.
Cool! I think we’re in agreement at a high level. Thanks for taking the extra time to make sure you were understood.
In more detail, though:
I think I disagree with 1 being all that likely; there are just other things I could see happening that would make a pause or stop politically popular (i.e. warning shots, An Inconvenient Truth AI Edition, etc.), likely not worth getting into here. I also think ‘if we pause it will be for stupid reasons’ is a very sad take.
I think I disagree with 2 being likely, as well; probably yes, a lot of the bottleneck on development is ~make-work that goes away when you get a drop-in replacement for remote workers, and also yes, AI coding is already an accelerant // effectively doing gradient descent on gradient descent (RLing the RL’d researcher to RL the RL...) is intelligence-explosion fuel. But I think there’s a big gap between the capabilities you need for politically worrisome levels of unemployment, and the capabilities you need for an intelligence explosion, principally because >30 percent of human labor in developed nations could be automated with current tech if the economics align a bit (hiring 200+k/year ML engineers to replace your 30k/year call center employee is only just now starting to make sense economically). I think this has been true of current tech since ~GPT-4, and that we haven’t seen a concomitant massive acceleration in capabilities on the frontier (things are continuing to move fast, and the proliferation is scary, but it’s not an explosion).
I take “depending on how concentrated AI R&D is” to foreshadow that you’d reply to the above with something like: “This is about lab priorities; the labs with the most impressive models are the labs focusing the most on frontier model development, and they’re unlikely to set their sights on comprehensive automation of shit jobs when they can instead double-down on frontier models and put some RL in the RL to RL the RL that’s been RL’d by the...”
I think that’s right about lab priorities. However, I expect the automation wave to mostly come from middle-men, consultancies, what have you, who take all of the leftover ML researchers not eaten up by the labs and go around automating things away individually (yes, maybe the frontier moves too fast for this to be right, because the labs just end up with a drop-in remote worker ‘for free’ as long as they keep advancing down the tech tree, but I don’t quite think this is true, because human jobs are human-shaped, and buyers are going to want pretty rigorous role-specific guarantees from whoever’s selling this service, even if they’re basically unnecessary, and the one-size-fits-all solution is going to have fewer buyers than the thing marketed as ‘bespoke’).
In general, I don’t like collapsing the various checkpoints between here and superintelligence; there are all these intermediate states, and their exact features matter a lot, and we really don’t know what we’re going to get. ‘By the time we’ll have x, we’ll certainly have y’ is not a form of prediction that anyone has a particularly good track record making.
I generally don’t think the Inconvenient truth movie mattered that much for solving climate change, compared to technological solutions like renewable energy, and made the issue a little more partisan (though environmentalism/climate change was unusually partisan by then) and I think social movements to affect AI already had less impact on AI safety than technical work (in a broad sense) for reducing doom, and I expect this trend to continue.
I think warning shots could scare the public, but I worry that the level of warning shots necessary to clear AI is in a fairly narrow band, and I also expect AI control to have a reasonable probability of containing human-level scheming models that do work, so I wouldn’t pick this at all.
I agree it’s a sad take that “if we pause it will be for stupid reasons”, but I also think this is the very likely attractor, if AI does become a subject that is salient in politics, because people hate nuance, and nuance matters way more than the average person wants to deal with on AI (For example, I think the second species argument critically misses important differences that make the human-AI relationship more friendly than the human-gorilla relationship, and that’s without the subject being politicized).
To address this:
I think the key crux is I believe that the unreliability of GPT-4 would doom any attempt to automate 30% of jobs, and I think at most 0-1% of jobs could be automated, and while in principle you could improve reliability without improving capabilities too much, I also don’t think the incentives yet favor this option.
I agree with this sort of argument, and in general I am not a fan of collapsing checkpoints between today’s AI and God AIs, which is a big mistake I think MIRI did, but my main claim is that the checkpoints would be illegible enough to the average citizen such that they don’t notice the progress until it’s too late, and that the reliability improvements will in practice also be coupled with capabilities improvements that matter to the AI explosion, but not very visible to the average citizen for the reason Garrison Lovely describes here:
https://x.com/GarrisonLovely/status/1866945509975638493
I think I get what you’re saying… That the argument you dislike is, “we should rush to AGI sooner, so that there’s less compute overhang when we get there.”
I agree that that argument is a pretty bad one. I personally think that we are already so far into a compute overhang regime that that ship has sailed. We are using very inefficient learning algorithms, and will be able to run millions of inference instances of any model we produce.
Does this correspond with what you are thinking?
I want to say yes, but I think this might be somewhat more narrow than I mean. It might be helpful if you could list a few other ways one might read my message, that seem similarly-plausible to this one.
Overhangs, overhangs everywhere. A thousand gleaming threads stretching backwards from the fog of the Future, forwards from the static Past, and ending in a single Gordian knot before us here and now.
That knot: understanding, learning, being, thinking. The key, the source, the remaining barrier between us and the infinite, the unknowable, the singularity.
When will it break? What holds it steady? Each thread we examine seems so inadequate. Could this be what is holding us back, saving us from ourselves, from our Mind Children? Not this one, nor that, yet some strange mix of many compensating factors.
Surely, if we had more compute, we’d be there already? Or better data? The right algorithms? Faster hardware? Neuromorphic chips? Clever scaffolding? Training on a regress of chains of thought, to better solutions, to better chains of thought, to even better solutions?
All of these, and none of these. The web strains at the breaking point. How long now? Days? Months?
If we had enough ways to utilize inference-time compute, couldn’t we just scale that to super-genius, and ask the genius for a more efficient solution? But it doesn’t seem like that has been done. Has it been tried? Who can say.
Will the first AGI out the gate be so expensive it is unmaintainable for more than a few hours? Will it quickly find efficiency improvements?
Or will we again be bound, hung up on novel algorithmic insights hanging just out of sight. Who knows?
Surely though, surely.… surely rushing ahead into the danger cannot be the wisest course, the safest course? Can we not agree to take our time, to think through the puzzles that confront us, to enumerate possible consequences and proactively reduce risks?
I hope. I fear. I stare in awestruck wonder at our brilliance and stupidity so tightly intermingled. We place the barrel of the gun to our collective head, panting, desperate, asking ourselves if this is it. Will it be? Intelligence is dead, long live intelligence.
This world?
Yes this world.
A group of researchers has released the Longitudinal Expert AI Panel, soliciting and collating forecasts regarding AI progress, adoption, and regulation from a large pool of both experts and non-experts.
it’s plausible to me that almost any public discussion of AI or AI safety that is not centrally about LLM consciousness should clarify this early and often.
Low-context audiences are really hung up on the consciousness topic, and are often reading entirely unrelated material as though it were trying to make a claim about consciousness, then generalizing to a judgement about the speaker that inoculates them against partitioning consciousness and capabilities.
Clarifying that you don’t mean to step in the consciousness discussion upfront may be a way to reduce instances of this (but could also backfire? I’m actually not confident in the solution; maybe the real thing is ‘if you’re discussing AI on the internet expect that someone with no context will show up and read it in this way, and do whatever makes sense to do to head that off’, but that seems much more costly and delicate a procedure than a simple disclaimer).
[inspired by recent twitter activity related to the Anthropic PSM paper]
Is fare evasion just price discrimination?
No. If you replace “just” with “partially model-able as”, then yes.
Surely the most important distinction is that normal price discrimination is usually based on trying to infer a customer’s willingness to pay (based on how wealthy they are, how much they want the product, etc). Versus fare evasion is also heavily based in how willing someone is to lie / cheat / otherwise break the rules. So, tolerating fare evasion is a form of “price discrimination” that’s dramatically corrosive to societal trust and other values, much moreso than any normal kind of price discrimination—effectively a disproportionate tax on honorable law-abiding people. See Kelsey piper’s article https://www.theargumentmag.com/p/the-honesty-tax
“Dramatically corrosive to societal trust” feels like a wild overstatement. Fair evasion is and has been a norm (at least in the US) for approximately as long as public transit has existed, and it doesn’t seem like it’s meaningfully accelerated (except in periods of lax enforcement, which my guess is would reach an equilibrium somewhere) or meaningfully (let alone dramatically!) damaged social trust.
I buy parts of Kelsey’s argument, but I think there’s a bait and switch (at least between her opening example and the fare evasion example) where we start out talking about how people who can’t afford food are incentivized to lie and then end up talking about how poor people shouldn’t be allowed to use public transit. I claim the maxim should be sensitive to the context and would like to entertain the notion that services with near-zero marginal cost should tolerate some non-zero amount of fare evasion.
Non-zero, of course, in the same sense that credit card companies tolerate a non-zero amount of fraudulent transactions as part of doing business. And I agree that the right amount of resources to spend on fare enforcement will be different for each transit system, depending on all kinds of particular circumstances. But I would guess that for most transit systems, “let’s help the poor by cutting back on fare evasion” would overall be a much worse use of marginal resources than “let’s help the poor by offering free or discounted fare cards or giving them fare credits”, or perhaps “let’s help everyone by slightly expanding service frequency / coverage”.
Eg, grocery stores tolerate some amount of “shrinkage” (people stealing the food) according to whatever maximizes profits. If grocery stores were run by the government and were willing to run at a loss in order to promote greater overall social welfare, I wouldn’t support a policy of “let’s just roll back our shrinkage enforcement and let people steal more food”. I would instead support giving poorer people something like expanded EBT food stamp credits. (And, per kelsey, signing up for such a program should be made simpler and more rational!)
Tolerating a high amount of fare evasion also means letting some very disorderly people into the system, which makes the experience worse for all the actual paying users of the system. So it is not always “near zero marginal cost” to let other people free-ride (even outside of busy, congested times).
For an example targeting rich people instead of the poor: I think we should lower income and corporate taxes (which discourage productive work) and replace the lost revenue with pigovian taxes (ie a carbon tax, sin taxes, etc) plus georgist land taxes. And, for the purposes of this conversation, say that I support lowering taxes overall, in a way that would disproportionately benefit rich people and corporations. But I am totally against severely rolling back IRS enforcement against tax cheats (as is currently happening), because this runs into all kinds of adverse selection problems where you’re now disproportionately rewarding (probably less economically productive) cheats, while punishing (probably more productive) honest people.
I didn’t mean to imply the full ‘optimal amount of fraud is non-zero’ frame. I do mean an amount above that, and typed ‘non-zero’ hastily.
I support what gets the people fed. Supporting an option that isn’t really on the table (because serious proposals for doing welfare well aren’t being actively debated and implemented) doesn’t do anyone any good. I am not talking about a perfect world; I am looking at the world we are in and locating a tiny intervention/frame shift that might marginally improve things.
Idk, but there seem to be papers on this.
Payment Evasion (Buehler 2017)
https://ux-tauri.unisg.ch/RePEc/usg/econwp/EWP-1435.pdf
Reading so many reviews/responses to IABIED, I wish more people had registered how they expected to feel about the book, or how they think a book on x-risk ought to look, prior to the book’s release.
Finalizing any Real Actual Object requires making tradeoffs. I think it’s pretty easy to critique the book on a level of abstraction that respects what it is Trying To Be in only the broadest possible terms, rather than acknowledging various sub-goals (e.g. providing an updated version of Nate + Eliezer’s now very old ‘canonical’ arguments), modulations of the broader goal (e.g. avoiding making strong claims about timelines, knowing this might hamstring the urgency of the message), and constraints (e.g. going through an accelerated version of the traditional publishing timeline, which means the text predates Anthropic’s Agentic Misalignment and, I’m sure, various other important recent findings).
A lot of the takes I see seem to come from a place of defending the ideal version of such a text by the lights of the reviewer, but it’s actually unclear to me whether many of these reviewers would have made the opposite critiques if the book had made the opposite call on the various tradeoffs. I don’t mean to say I think these reviewers are acting in bad faith; I just think it’s easy to avoid confronting how your ideal version couldn’t possibly be realized, and make post-hoc adjustments to that ideal thing in service of some (genuine, worthwhile) critique of the Real Thing.
Previously, it annoyed me that people had pre-judged the book’s contents. Now, I’m grateful to folks who wrote about it, or talked to me about it, before they read it (Buck Shlegeris, Nina Panickserry, a few others), because I can judge the consistency of their rubric myself, rather than just feeling:
Which is the feeling that’s overtaken me as I read the various reviews from folks throughout the community.
I should say: I’m grateful for all the conversation, including the dissent, because it’s all data, but it is worse data than it would have been if you’d taken it upon yourself to cause a fuss in one of the many LW posts made in the lead-up to the book (and in the future I will be less rude to people who do this, because actually, it’s a kindness!).
Haven’t found the new ‘following’ page especially useful since the change (whereas previously it was my default). Not sure why that is exactly, but one thing I think would help is if it consolidated threads with multiple posts from followed users, instead of duplicating the same thread many times within the feed. If two or more people I follow have a back and forth, it eats all the other updates (and I usually read the whole back and forth by clicking a link from the first child I see, so then my feed is majority stuff I’ve already seen).
The CCRU is under-discussed in this sphere as a direct influence on the thoughts and actions of key players in AI and beyond.
Land started a creative collective, alongside Mark Fisher, in the 90s. I learned this by accident, and it seems like a corner of intellectual history that’s at least as influential as ie the extropians.
If anyone knows of explicit connections between the CCRU and contemporary phenomena (beyond Land/Fisher’s immediate influence via their later work), I’d love to hear about them.
Yarvin was not part of the CCRU. I think Land and Yarvin only became associates post-CCRU.
updated, thanks!
I agree that there is more of an influence than seems to be talked about. Besides the obvious influence on e/acc and of Land’s AGI opinions (the diagonality thesis, Pythia), the CCRU came up with hyperstitions (https://www.lesswrong.com/tag/hyperstitions), which seems to be a popular concept with the cyborgs.
Maybe they also contributed to the popularity of “Cthulhu” and “Shoggoth” as symbols in some AI circles.
Sometimes people give a short description of their work. Sometimes they give a long one.
I have an imaginary friend whose work I’m excited about. I recently overheard them introduce and motivate their work to a crowd of young safety researchers, and I took notes. Here’s my best reconstruction of what he’s up to:
“I work on median-case out-with-a-whimper scenarios and automation forecasting, with special attention to the possibility of mass-disempowerment due to wealth disparity and/or centralization of labor power. I identify existing legal and technological bottlenecks to this hypothetical automation wave, including a list of relevant laws in industries likely to be affected and a suite of evals designed to detect exactly which kinds of tasks are likely to be automated and when.
“My guess is that there are economically valuable AI systems between us and AGI/TAI/ASI, and that executing on safety and alignment plans in the midst of a rapid automation wave is dizzyingly challenging. Thinking through those waves in advance feels like a natural extension of placing any weight at all on the structure of the organization that happens to develop the first Real Scary AI. If we think that the organizational structure and local incentives of a scaling lab matter, shouldn’t we also think that the societal conditions and broader class of incentives matter? Might they matter more? The state of the world just before The Thing comes on line, or as The Team that makes The Thing is considering making The Thing, has consequences for the nature of the socio-technical solutions that work in context.
“At minimum, my work aims to buy us some time and orienting-power as the stakes raise. I’d imagine my maximal impact is something like “develop automation timelines and rollout plans that you can peg AI development to, such that the state of the world and the state of the art AI technology advance in step, minimizing the collateral damage and chaos of any great economic shift.”
“When I’ve brought these concerns up to folks at labs, they’ve said that these matters get discussed internally, and that there’s at least some agreement that my direction is important, but that they can’t possibly be expected to do everything to make the world ready for their tech. I, perhaps somewhat cynically, think they’re doing narrow work here on the most economically valuable parts, but that they’re disinterested in broader coordination with public and private entities, since it would be economically disadvantageous to them.
“When I’ve brought these concerns up to folks in policy, they’ve said that some work like this is happening, but that it’s typically done in secret, to avoid amorphous negative externalities. Indeed, the more important this work is, the less likely someone is to publish it. There’s some concern that a robust and publicly available framework of this type could become a roadmap for scaling labs that helps them focus their efforts for near-term investor returns, possibly creating more fluid investment feedback loops and lowering the odds that disappointed investors back out, indirectly accelerating progress.
“Publicly available work on the topic is ~abysmal, painting the best case scenario as the most economically explosive one (most work of this type is written for investors and other powerful people), rather than pricing in the heightened x-risk embedded in this kind of destabilization. There’s actually an IMF paper modeling automation from AI systems using math from the industrial revolution. Surely there’s a better way here, and I hope to find it.”
What text analogizing LLMs to human brains have you found most compelling?
Shameless self promotion: this one https://www.lesswrong.com/posts/ASmcQYbhcyu5TuXz6/llms-could-be-as-conscious-as-human-emulations-potentially
It circumvents object level question and instead looks at epistemic one.
This one is about broader direction in “how the things that happened change attitudes and opinions of people”
https://www.astralcodexten.com/p/sakana-strawberry-and-scary-ai
This one too, about consciousness in particular
https://dynomight.net/consciousness/
I think it’s somewhat productive direction explored in these 3 posts, but it’s not like very object level, more about epistemics of it all. I think you can look up how like LLM states overlap / predict / correspond with brain scans of people who engage in some tasks? I think there were a couple of paper on that.
E.g. here https://www.neuroai.science/p/brain-scores-dont-mean-what-we-think
Thinking about qualia, trying to avoid getting trapped in the hard problem of consciousness along the way.
Tempted to model qualia as a region with the capacity to populate itself with coarse heuristics for difficult-to-compute features of nodes in a search process, which happens to ship with a bunch of computational inconveniences (that are most of what we mean to refer to when we reference qualia).
This aids in generality, but trades off against locally optimal processes, as a kind of ‘tax’ on all cognition.
This is a literal shower thought and I’ve read nothing on this anywhere; most of the literature appears to be trapped in parts of the hard problem that I don’t find compelling, and which don’t seem necessary to grapple with for this angle of inquiry (I’m thinking about something closer to the ‘easy problems’).
Any reading recommendations on this or related topics?
Following up to say that the thing that maps most closely to what I was thinking about (or satisfied my curiosity) is GWT.
GWT is usually intended to approach the hard problem, but the principle critique of it is that it isn’t doing that at all (I ~agree). Unfortunately, I had dozens of frustrating conversations with people telling me ‘don’t spend any time thinking about consciousness; it’s a dead end; you’re talking about the hard problem; that triggers me; STOP’ before someone actually pointed me in the right direction here, or seemed open to the question at all.
Is qualia (it’s existence or not, how and why it happens) not the exact thing the hard problem is about? If you’re ignoring the hard problem or dismiss it you also doubt the existence of qualia.
I guess I should have said ‘without getting caught in the nearby attractors associated with most conversations about the hard problem of consciousness’. There’s obviously a lot there, and my guess is >95 percent of it wouldn’t feel to me like it has little meaningful surface area with what I’m curious about.
[errant thought pointing a direction, low-confidence musing, likely retreading old ground]
There’s a disagreement that crops up in conversations about changing people’s minds. Sides are roughly:
You should explain things by walking someone through your entire thought process, as it actually unfolded. Changing minds is best done by offering an account of how your own mind was changed.
You should explain things by back-chaining the most viable (valid) argument, from your conclusions, with respect to your specific audience.
This first strategy invites framing your argument around the question “How did I come to change my mind?”, and this second invites framing your argument around the question “How might I change my audience’s mind?”. I am sometimes characterized as advocating for approach 2, and have never actually taken that to be my position.I think there’s a third approach here, which will look to advocates of approach 1 as if it were approach 2, and look to advocates of approach 2 as if it were approach 1. That is, you should frame the strategy around the question “How might my audience come to change their mind?”, and then not even try to change it yourself.
This third strategy is about giving people handles and mechanisms that empower them to update based on evidence they will encounter in the natural course of their lives, rather than trying to do all of the work upfront. Don’t frame your own position as some competing argument in the market place of ideas; hand your interlocutor a tool, tell them what they might expect, and let their experience confirm your predictions.I think this approach has a few major differences over the other two approaches, from the perspective of its impact:
It requires much less authority. (Strength!)
It can be executed in a targeted, light-weight fashion. (Strength!)
It’s less likely to slip into deception than option 2, and less confrontational than option 1. (Strength!)
Even if it works, they won’t end up thinking exactly what you think (Weakness?)
….but they’ll be better equipped to make sense of new evidence (Strength!)
Plausibly more mimetically fit than option 1 or 2 (a failure mode of 1 is that your interlocutor won’t be empowered to stand up to criticism while spreading the ideas, even to people very much like themselves, and for option 2 it’s that they will ONLY be successful in spreading the idea to people who are like themselves, since they only know the argument that works on them).
I think Eliezer has talked about some version of this in the past, and this is part of why people like predictions in general, but I think pasting a prediction at the end of an argument built around strategy 1 or 2 isn’t actually Doing The Thing I mean here.
Friends report Logan’s writing strongly has this property.
Do you think of rationality as a similar sort of ‘object’ or ‘discipline’ to philosophy? If not, what kind of object do you think of it as being?
(I am no great advocate for academic philosophy; I left that shit way behind ~a decade ago after going quite a ways down the path. I just want to better understand whether folks consider Rationality as a replacement for philosophy, a replacement for some of philosophy, a subset of philosophical commitments, a series of cognitive practices, or something else entirely. I can model it, internally, as aiming to be any of these things, without other parts of my understanding changing very much, but they all have ‘gaps’, where there are things that I associate with Rationality that don’t actually naturally fall out of the core concepts as construed as any of these types of category [I suppose this is the ‘being a subculture’ x-factor]).
Sometimes people express concern that AIs may replace them in the workplace. This is (mostly) silly. Not that it won’t happen, but you’ve gotta break some eggs to make an industrial revolution. This is just ‘how economies work’ (whether or not they can / should work this way is a different question altogether).
The intrinsic fear of joblessness-resulting-from-automation is tantamount to worrying that curing infectious diseases would put gravediggers out of business.
There is a special case here, though: double digit unemployment (and youth unemployment, in particular) is a major destabilizing force in politics. You definitely don’t want an automation wave so rapid that the jobless and nihilistic youth mount a civil war, sharply curtailing your ability to govern the dangerous technologies which took everyone’s jobs in the first place.
As AI systems become more expensive, and more powerful, and pressure to deploy them profitably increases, I’m fairly concerned that we’ll see a massive hollowing out of many white collar professions, resulting in substantial civil unrest, violence, chaos. I’m not confident that we’ll get (i.e.) a UBI (or that it would meaningfully change the situation even if we did), and I’m not confident that there’s enough inertia in existing economic structures to soften the blow.
The IMF estimates that current tech (~GPT 4 at launch) can automate ~30% of human labor performed in the US. That’s a big, scary number. About half of these, they imagine, are the kinds of things you always want more of anyway, and that this complementarity will just drive production in that 15% of cases. The other 15%, though, probably just stop existing as jobs altogether (for various reasons, I think a 9:1 replacement rate is more likely than full automation, with current tech).
This mostly isn’t happening yet because you need an ML engineer to commit Serious Time to automating away That Job In Particular. ML engineers are expensive, and usually not specced for the kind of client-facing work that this would require (i.e. breaking down tasks that are part of a job, knowing what parts can be automated, and via what mechanisms, be that purpose-built models, fine-tuning, a prompt library for a human operator, some specialized scaffolding...). There’s just a lot of friction and lay-speak necessary to accomplish this, and it’s probably not economically worth it for some subset of necessary parties (ML engineers can make more elsewhere than small business owners can afford to pay them to automate things away, for instance).
So we’ve got a bottleneck, and on the other side of it, this speculative 15% leap in unemployment. That 15% potential leap, though, is climbing as capabilities increase (this is tautologically true; “drop in replacement for a remote worker” is one major criteria used in discussions about AI progress).
I don’t expect 15% unemployment to destabilize the government (Great Depression peak was 25%, which is a decent lower bound on ‘potentially dangerous’ levels of unemployment in the US). But I do expect that 15% powder keg to grow in size, and potentially cross into dangerous territory before it’s lit.
Previously, I’d actually arrived at that 30% number myself (almost exactly one year ago), but I had initially expected:
Labs would devote substantial resources to this automation, and it would happen more quickly than it has so far.
All of these jobs were just on the chopping block (frankly, I’m not sure how much I buy the complementarity argument, but I am An Internet Crank, and they are the International Monetary Fund, so I’ll defer to them).
These two beliefs made the situation look much more dire than I now believe it to be, but it’s still, I claim, worth entertaining as A Way This Whole Thing Could Go, especially if we’re hitting a capabilities plateau, and especially if we’re doubling down on government intervention as our key lever in obviating x-risk.
[I’m not advocating for a centrally planned automation schema, to be clear; I think these things have basically never worked, but would like to hear counterexamples. Maybe just like… a tax on automation to help staunch the flow of resources into the labs and their surrogates, a restructuring of unemployment benefits and retraining programs and, before any of that, a more robust effort to model the economic consequences of current and future systems than the IMF report that just duplicates the findings of some idiot (me) frantically reviewing BLS statistics in the winter of 2023.]
I (and maybe you) have historically underrated the density of people with religious backgrounds in secular hubs. Most of these people don’t ‘think differently’, in a structural sense, from their forebears; they just don’t believe in that God anymore.
The hallmark here is a kind of naive enlightenment approach that ignores ~200 years of intellectual history (and a great many thinkers from before that period, including canonical philosophers they might claim to love/respect/understand). This type of thing.
They’re no less tribal or dogmatic, or more critical, than the place they came from. They just vote the other way and can maybe talk about one or two levels of abstraction beyond the stereotype they identify against (although they can’t really think about those levels).
You should still be nice to them, and honest with them, but you should understand what you’re getting into.
The mere biographical detail of having a religious background or being religious isn’t a strong mark against someone’s thinking on other topics, but it is a sign you may be talking to a member of a certain meta-intellectual culture, and need to modulate your style. I have definitely had valuable conversations with people that firmly belong in this category, and would not categorically discourage engagement. Just don’t be so surprised when the usual jutsu falls flat!
I don’t think I really understood what it meant for establishment politics to be divisive until this past election.
As good as it feels to sit on the left and say “they want you to hate immigrants” or “they want you to hate queer people”, it seems similarly (although probably not equally?) true that the center left also has people they want you to hate (the religious, the rich, the slightly-more-successful-than-you, the ideologically-impure-who-once-said-a-bad-thing-on-the-internet).
But there’s also a deeper, structural sense in which it’s true.
Working on AIS, I’ve long hoped that we could form a coalition with all of the other people worried about AI, because a good deal of them just.. share (some version of) our concerns, and our most ambitious policy solutions (e.g. stopping development, mandating more robust interpretability and evals) could also solve a bunch of problems highlighted by the FATE community, the automation-concerned, etc etc.
Their positions also have the benefit of conforming to widely-held anxieties (‘I am worried AI will just be another tool of empire’, ‘I am worried I will lose my job for banal normie reasons that have nothing to do with civilizational robustness’, ‘I am worried AI’s will cheaply replace human labor and do a worse job, enshittifying everything in the developed world’). We could generally curry popular support and favor, without being dishonest, by looking at the Venn diagram of things we want and things they want (which would also help keep AI policy from sliding into partisanship, if such a thing is still possible, given the largely right-leaning associations of the AIS community*).
For the next four years, at the very least, I am forced to lay this hope aside. That the EO contained language in service of the FATE community was, in hindsight, very bad, and probably foreseeably so, given that even moderate Republicans like to score easy points on culture war bullshit. Probably it will be revoked, because language about bias made it an easy thing for Vance to call “far left”.
“This is ok because it will just be replaced.”
Given the current state of the game board, I don’t want to be losing any turns. We’ve already lost too many turns; setbacks are unacceptable.
“What if it gets replaced by something better?”
I envy your optimism. I’m also concerned about the same dynamic playing out in reverse; what if the new EO (or piece of legislation via whatever mechanism), like the old EO, contains some language that is (to us) beside the point, but nonetheless signals partisanship, and is retributively revoked or repealed by the next administration? This is why you don’t want AIS to be partisan; partisanship is dialectics without teleology.
Ok, so structurally divisive: establishment politics has made it ~impossible to form meaningful coalitions around issues other than absolute lightning rods (e.g. abortion, immigration; the ‘levers’ available to partisan hacks looking to gin up donations). It’s not just that they make you hate your neighbors, it’s that they make you behave as though you hate your neighbors, lest your policy proposals get painted with the broad red brush and summarily dismissed.
I think this is the kind of observation that leads many experienced people interested in AIS to work on things outside of AIS, but with an eye toward implications for AI (e.g. Critch, A Ray). You just have these lucid flashes of how stacked the deck really is, and set about digging the channel that is, compared to the existing channels, marginally more robust to reactionary dynamics (‘aligning the current of history with your aims’ is maybe a good image).
Hopefully undemocratic regulatory processes serve their function as a backdoor for the sensible, but it’s unclear how penetrating the partisanship will be over the next four years (and, of course, those at the top are promising that it will be Very Penetrating).
*I am somewhat ambivalent about how right-leaning AIS really is. Right-leaning compared to middle class Americans living in major metros? Probably. Tolerant of people with pretty far-right views? Sure, to a point. Right of the American center as defined in electoral politics (e.g. ‘Republican-voting’)? Usually not.
Does anyone have examples of concrete actions taken by Open Phil that point toward their AIS plan being anything other than ‘help Anthropic win the race’?
Grants to Redwood Research, SERI MATS, NYU alignment group under Sam Bowman for scalable supervision, Palisade research, and many dozens more, most of which seem net positive wrt TAI risk.
Many MATS scholars go to Anthropic (source: I work there).
Redwood I’m really not sure, but that could be right.
Sam now works at Anthropic.
Palisade: I’ve done some work for them, I love them, I don’t know that their projects so far inhibit Anthropic (BadLlama, which I’m decently confident was part of the cause for funding them, was pretty squarely targeted at Meta, and is their most impactful work to date by several OOM). In fact, the softer versions of Palisade’s proposal (highlighting misuse risk, their core mission), likely empower Anthropic as seemingly the most transparent lab re misuse risks.
I take the thrust of your comment to be “OP funds safety, do your research”. I work in safety; I know they fund safety.
I also know most safety projects differentially benefit Anthropic (this fact is independent of whether you think differentially benefiting Anthropic is good or bad).
If you can make a stronger case for any of the other of the dozens of orgs on your list than exists for the few above, I’d love to hear it. I’ve thought about most of them and don’t see it, hence why I asked the question.
Further: the goalpost is not ‘net positive with respect to TAI x-risk.’ It is ‘not plausibly a component of a meta-strategy targeting the development of TAI at Anthropic before other labs.’
Edit: use of the soldier mindset flag above is pretty uncharitable here; I am asking for counter-examples to a hypothesis I’m entertaining. This is the actual opposite of soldier mindset.
Apologies for the soldier mindset react, I pattern-matched to some more hostile comment. Communication is hard.
Makes sense. Pretty sure you can remove it (and would appreciate that).
Sam Kriss (author of the ‘Laurentius Clung’ piece) has posted a critique. I don’t think it’s good, but I do think it’s representative of a view that I ever encounter in the wild but haven’t really seen written up.
https://samkriss.substack.com/p/against-truth
What view? This seemed largely substanceless, often arbitrary/wrong, and a waste of my time. It had no overarching view except to state his own negative interpretation of various bits of the rationalist scene as fact. (Downvoted.)
Example of substanceless:
Example of wrong:
Fine by me that this doesn’t meet your bar for what counts as a view. It is, however, very literally a ‘way he sees you and your friends’, and being seen a way has consequences, no matter how incoherent or inaccurate the way you’re being seen is.
I see! Thx for clarifying.
FWIW I don’t agree that being seen a way reliably has any consequences that matter. I should probably write a post sometime on my view about the importance of not respecting or paying much attention to people who are paying attention to you; this point of disagreement has come up a lot.
I think we only disagree on the margin — about ‘how reliably’