I’m confident enough in this take to write it as a PSA: playing music at medium-size-or-larger gatherings is a Chesterton’s Fence situation.
It serves the very important function of reducing average conversation size: the louder the music, the more groups naturally split into smaller groups, as people on the far end develop a (usually unconscious) common knowledge that it’s too much effort to keep participating in the big one and they can start a new conversation without being unduly disruptive.
If you’ve ever been at a party with no music where people gravitate towards a single (or handful of) group of 8+ people, you’ve experienced the failure mode that this solves: usually these conversations are then actually conversations of 2-3 people with 5-6 observers, which is usually unpleasant for the observers and does not facilitate close interactions that easily lead to getting to know people.
By making it hard to have bigger conversations, the music naturally produces smaller ones; you can modulate the volume to have the desired effect on a typical discussion size. Quiet music (e.g. at many dinner parties) makes it hard to have conversations bigger than ~4-5, which is already a big improvement. Medium-volume music (think many bars) facilitates easy conversations of 2-3. The extreme end of this is dance clubs, where very loud music (not coincidentally!) makes it impossible to maintain conversations bigger than 2.
I suspect that high-decoupler hosts are just not in the habit of thinking “it’s a party, therefore I should put music on,” or even actively think “music makes it harder to talk and hear each other, and after all isn’t that the point of a party?” But it’s a very well-established cultural practice to play music at large gatherings, so, per Chesterton’s Fence, you need to understand what function it plays. The function it plays is to stop the party-destroying phenomenon of big group conversations.
As having gone to Lighthaven, does this still feel marginally worth it at Lighthaven where we mostly tried to make it architecturally difficult to have larger conversations? I can see the case for music here, but like, I do think music makes it harder to talk to people (especially on the louder end) and that does seem like a substantial cost to me.
Talking 1-1 with music is so difficult to me that I don’t enjoy a place if there’s music. I expect many people on/towards the spectrum could be similar.
Having been at two LH parties, one with music and one without, I definitely ended up in the “large conversation with 2 people talking and 5 people listening”-situation much more in the party without music.
That said, I did find it much easier to meet new people at the party without music, as this also makes it much easier to join conversations that sound interesting when you walk past (being able to actually overhear them).
This might be one of the reasons why people tend to progressively increase the volume of the music during parties. First give people a chance to meet interesting people and easily join conversations. Then increase the volume to facilitate smaller conversations.
Yeah, when there’s loud music it’s much easier for me to understand people I know than people I don’t because I’m already used to their speaking patterns, and can more easily infer what they said even when I don’t hear it perfectly. And also because any misunderstanding or difficulty that rises out of not hearing each other well is less awkward with someone I already know than someone I do.
As someone who’s spent meaningful amounts of time at LH during parties, absolutely yes. You successfully made it architecturally awkward to have large conversations, but that’s often cashed out as “there’s a giant conversation group in and totally blocking [the Entry Hallway Room of Aumann]/[the lawn between A&B]/[one or another firepit and its surrounding walkways]; that conversation group is suffering from the obvious described failure modes, but no one in it is sufficiently confident or agentic or charismatic to successfully break out into a subgroup/subconversation.
I’d recommend quiet music during parties? Or maybe even just a soundtrack of natural noises—birdsong and wind? rain and thunder? - to serve the purpose instead.
@habryka Forgot to comment on the changes you implemented for soundscape at LH during the mixer—possibly you may want to put a speaker in the Bayes window overlooking the courtyard firepit. People started congregating/pooling there (and notably not at the other firepit next to it!) because it was the locally-quietest location, and then the usual failure modes of an attempted 12-person conversation ensued.
Seems cheap to get the info value, especially for quieter music? Can be expensive to set up a multi-room sound system, but it’s probably most valuable in the room that is largest/most prone to large group formation, so maybe worth experimenting with a speaker playing some instrumental jazz or something. I do think the architecture does a fair bit of work already.
I’m being slightly off-topic here, but how does one “makes it architecturally difficult to have larger conversations”? More broadly, the topic of designing spaces where people can think better/do cooler stuff/etc. is fascinating, but I don’t know where to learn more than the very basics of it. Do you know good books, articles, etc. on these questions, by any chance?
Thanks! I knew of Alexander, but you reminded me that I’ve been procrastinating on tackling the 1,200+ pages of A Pattern Language for a few months, and I’ve now started reading it :-)
Was one giant cluster last two times I was there. In the outside area. Not sure why the physical space arrangement wasn’t working. I guess walking into a cubby feels risky/imposing, and leaving feels rude. I would have liked it to work.
I’m not sure how you could improve it. I was trying to think of something last time I was there. “Damn all these nice cubbies are empty.” I could not think of anything.
I agree music has this effect, but I think the Fence is mostly because it also hugely influences the mood of the gathering, i.e. of the type and correlatedness of people’s emotional states.
(Music also has some costs, although I think most of these aren’t actually due to the music itself and can be avoided with proper acoustical treatment. E.g. people sometimes perceive music as too loud because the emitted volume is literally too high, but ime people often say this when the noise is actually overwhelming for other reasons, like echo (insofar as walls/floor/ceiling are near/hard/parallel), or bass traps/standing waves (such that the peak amplitude of the perceived wave is above the painfully loud limit, even though the average amplitude is fine; in the worst cases, this can result in barely being able to hear the music while simultaneously perceiving it as painfully loud!)
Gatherings with generous alcohol drinking tend to have louder music because alcohol relaxes the inner ear muscles, resulting in less vibration being conveyed, resulting in sound dampening. So anyone drinking alcohol experiences lower sound volumes. This means that a comfortable volume for a drunk person is quite a bit higher than for a sober person. Which is a fact that can be quite unpleasant if you are the designated driver! I always try to remember to bring earplugs if I’m going to be a designated driver for a group going out drinking.
If you are drinking less than the average amount of alcohol at a social gathering, chances are your opinion of the music will be that it is too loud.
2. The intent of the social gathering in some cases is to facilitate good conversations. In such a case the person managing the music (host or DJ) should be thoughtful of this, and aim for a ‘coffee shop’ vibe with quiet background music and places to go in the venue where the music dwindles away.
In the alternate case, where the intent of the party is to facilitate social connection and/or flirtation and/or fun dancing… then the host / DJ may be actively pushing the music loud to discourage any but the most minimal conversation, trying to get people to drink alcohol and dance rather than talk, and at most have brief simple 1-1 conversations. A dance club is an example of a place deliberately aiming for this end of the spectrum.
So, in designing a social gathering, these factors are definitely something to keep in mind. What are the goals of the gathering? How much, if any, alcohol will the guests be drinking? If you have put someone in charge of controlling the music, are they on the same page about this? Or are they someone who is used to controlling music in a way appropriate to dance hall style scenarios and will default to that?
In regards to intellectual discussion focused gatherings, I do actually think that there can be a place for gatherings of people in which only a small subset of people talk… but I agree this shouldn’t be the default. The scenario where I think this makes sense is something more like a debate club or mini lecture with people taking turns to ask questions or challenge assumptions of the lecturer. This is less a social gathering and more an educational type experience, but can certainly be something on the borderlands between coffeeshop-style small group conversation and formal academic setting. Rousing debates and speeches or mini lectures around topics that the group finds interesting, relevant, and important can be both an educational experience and a fun social experience to perform or watch. I think this is something that needs more planning and structure to go well, and which people should be aware is intended and what rules the audience will be expected to follow in regards to interruptions, etc.
get people to drink alcohol and dance rather than talk
Also important to notice that restaurants and bars are not fully aligned with your goals. On one hand, if you feel good there, you are likely to come again, and thus generate more profit for them—this part is win/win. On the other hand, it is better for them if you spend less time talking (even if that’s what you like), and instead eat and drink more, and then leave, so that other paying customers can come—that part is win/lose.
(Could restaurants become better aligned if instead of food we paid them for time? I suspect this would result in other kind of frustrating actions, such as them taking too much time to bring the food in very small portions.)
So while it is true that the music serves a socially useful purpose, it also serves a profit-increasing purpose, so I suspect that the usual volume of music we are used to is much higher than would be socially optimal.
Could restaurants become better aligned if instead of food we paid them for time?
The “anti-café” concept is like this. I’ve never been to one myself, but I’ve seen descriptions on the Web of a few of them existing. They don’t provide anything like restaurant-style service that I’ve heard; instead, there are often cheap or free snacks along the lines of what a office break room might carry, along with other amenities, and you pay for the amount of time you spend there.
I think a restaurant where you paid for time, if the food was nothing special, would quickly turn into a coworking space. Maybe it would be more open-office and more amenable to creative, conversational, interpersonal work rather than laptop work. You probably want it to be a cafe—or at least look like a cafe from the outside in signage / branding; you may want architectural sound dampening like a denny’s booth. You could sell pre-packaged food and sodas—it isn’t what they’re here for. Or you could even sell or rent activities like coloring books, simple social tabletop games, small toys, lockpicking practice locks, tiny marshmallow candle smore sets, and so on.
Unfortunately different people have different levels of hearing ability, so you’re not setting the conversation size at the same level for all participants. If you set the volume too high, you may well be excluding some people from the space entirely.
I think that people mostly put music on in these settings as a way to avoid awkward silences and to create the impression that the room is more active than it is, whilst people are arriving. If this is true, then it serves no great purpose once people have arrived and are engaged in conversation.
Another important consideration is sound-damping. I’ve been in venues where there’s no music playing and the conversations are happening between 3 −5 people but everyone is shouting to be heard above the crowd, and it’s incredibly difficult for someone with hearing damage to participate at all. This is primarily a result of hard, echoey walls and very few soft furnishings.
I think there’s something to be said for having different areas with different noise levels, allowing people to choose what they’re comfortable with, and observing where they go.
It seems to me that this claim has a lot to overcome, given that the observers could walk away at any time.
does not facilitate close interactions that easily lead to getting to know people.
Is that a goal? I’ve never been much of a partygoer, but if I want to have a one-on-one conversation with somebody and get to know them, a party is about the last place I’d think about going. Too many annoying interruptions.
The function it plays is to stop the party-destroying phenomenon of big group conversations.
It may do that, but that doesn’t necessarily mean that that’s the function. You could equally well guess that its function was to exclude people who don’t like loud music, since it also does that.
this is an incredible insight! from this I think we can design better nightclublike social spaces for people who don’t like loud sounds (such as people in this community with signal processing issues due to autism).
One idea I have is to do it in the digital. like, VR chat silent nightclub where the sound falloff is super high. (perhaps this exists?) Or a 2D top down equivalent. I will note that Gather Town is backwards—the sound radius is so large that there is still lots of lemurs, but at the same time you can’t read people’s body language from across the room—and instead there needs to be an emotive radius from webcam / face-tracking needs to be larger than the sound radius. Or you can have a trad UI with “rooms” of very small size that you have to join to talk. tricky to get that kind of app right though since irl there’s a fluid boundary between in and out of a convo and a binary demarcation would be subtly unpleasant.
Another idea is to find alternative ways to sound isolate in meatspace. Other people have talked about architectural approaches like in Lighthaven. Or imagine a party where everyone had to wear earplugs. sound falls off with the square of distance and you can calculate out how many decibles you need to deafen everyone by to get the group sizes you want. Or a party with a rule that you have to plug your ears when you aren’t actively in a conversation. Or you could lay out some hula hoops with space between them and the rule is you can only talk within the hula hoop with other people in it, and you can’t listen in on someone else’s hula hoop convo. have to plug your ears as you walk around. Better get real comfortable with your friends! Maybe secretly you can move the hoops around to combine into bigger groups if you are really motivated. Or with way more effort, you could similarly do a bed fort building competition. These are very cheap experiments!
I think some of the AI safety policy community has over-indexed on the visual model of the “Overton Window” and under-indexed on alternatives like the “ratchet effect,” “poisoning the well,” “clown attacks,” and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).
I’m not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of “Overton Window-moving” strategies executed in practice have larger negative effects via associating their “side” with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.
In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea “outside the window” and this actually makes the window narrower. But I think the visual imagery of “windows” actually struggles to accommodate this—when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.
Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).
These are plausible concerns, but I don’t think they match what I see as a longtime DC person.
We know that the legislative branch is less productive in the US than it has been in any modern period, and fewer bills get passed (many different metrics for this, but one is https://www.reuters.com/graphics/USA-CONGRESS/PRODUCTIVITY/egpbabmkwvq/) . Those bills that do get passed tend to be bigger swings as a result—either a) transformative legislation (e.g., Obamacare, Trump tax cuts and COVID super-relief, Biden Inflation Reduction Act and CHIPS) or b) big omnibus “must-pass” bills like FAA reauthorization, into which many small proposals get added in.
I also disagree with the claim that policymakers focus on credibility and consensus generally, except perhaps in the executive branch to some degree. (You want many executive actions to be noncontroversial “faithfully executing the laws” stuff, but I don’t see that as “policymaking” in the sense you describe it.)
In either of those, it seems like the current legislative “meta” favors bigger policy asks, not small wins, and I’m having trouble of thinking of anyone I know who’s impactful in DC who has adopted the opposite strategy. What are examples of the small wins that you’re thinking of as being the current meta?
Agree with lots of this– a few misc thoughts [hastily written]:
I think the Overton Window frame ends up getting people to focus too much on the dimension “how radical is my ask”– in practice, things are usually much more complicated than this. In my opinion, a preferable frame is something like “who is my target audience and what might they find helpful.” If you’re talking to someone who makes it clear that they will not support X, it’s silly to keep on talking about X. But I think the “target audience first” approach ends up helping people reason in a more sophisticated way about what kinds of ideas are worth bringing up. As an example, in my experience so far, many policymakers are curious to learn more about intelligence explosion scenarios and misalignment scenarios (the more “radical” and “speculative” threat models).
I don’t think it’s clear that the more effective actors in DC tend to be those who look for small wins. Outside of the AIS community, there sure do seem to be a lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate. Whether or not these organizations end up having more or less influence than the more “centrist” groups is, in my view, not a settled question & probably varies a lot by domain. In AI safety in particular, I think my main claim is something like “pretty much no group– whether radical or centrist– has had tangible wins. When I look at the small set of tangible wins, it seems like the groups involved were across the spectrum of “reasonableness.”
The more I interact with policymakers, the more I’m updating toward something like “poisoning the well doesn’t come from having radical beliefs– poisoning the well comes from lamer things like being dumb or uninformed, wasting peoples’ time, not understanding how the political process works, not having tangible things you want someone to do, explaining ideas poorly, being rude or disrespectful, etc.” I’ve asked ~20-40 policymakers (outside of the AIS bubble) things like “what sorts of things annoy you about meetings” or “what tends to make meetings feel like a waste of your time”, and no one ever says “people come in with ideas that are too radical.” The closest thing I’ve heard is people saying that they dislike it when groups fail to understand why things aren’t able to happen (like, someone comes in thinking their idea is great, but then they fail to understand that their idea needs approval from committee A and appropriations person B and then they’re upset about why things are moving slowly). It seems to me like many policy folks (especially staffers and exec branch subject experts) are genuinely interested in learning more about the beliefs and worldviews that have been prematurely labeled as “radical” or “unreasonable” (or perhaps such labels were appropriate before chatGPT but no longer are).
A reminder that those who are opposed to regulation have strong incentives to make it seem like basically-any-regulation is radical/unreasonable. An extremely common tactic is for industry and its allies to make common-sense regulation seem radical/crazy/authoritarian & argue that actually the people proposing strong policies are just making everyone look bad & argue that actually we should all rally behind [insert thing that isn’t a real policy.] (I admit this argument is a bit general, and indeed I’ve made it before, so I won’t harp on it here. Also I don’t think this is what Trevor is doing– it is indeed possible to raise serious discussions about “poisoning the well” even if one believes that the cultural and economic incentives disproportionately elevate such points).
In the context of AI safety, it seems to me like the most high-influence Overton Window moves have been positive– and in fact I would go as far as to say strongly positive. Examples that come to mind include the CAIS statement, FLI pause letter, Hinton leaving Google, Bengio’s writings/speeches about rogue AI & loss of control, Ian Hogarth’s piece about the race to god-like AI, and even Yudkowsky’s TIME article.
I think some of our judgments here depend on underlying threat models and an underlying sense of optimism vs. pessimism. If one things that labs making voluntary agreements/promises and NIST contributing to the development of voluntary standards are quite excellent ways to reduce AI risk, then the groups that have helped make this happen deserve a lot of credit. If one thinks that much more is needed to meaningfully reduce xrisk, then the groups that are raising awareness about the nature of the problem, making high-quality arguments about threat models, and advocating for stronger policies deserve a lot of credit.
I agree that more research on this could be useful. But I think it would be most valuable to focus less on “is X in the Overton Window” and more on “is X written/explained well and does it seem to have clear implications for the target stakeholders?”
Re: how over-emphasis on “how radical is my ask” vs “what my target audience might find helpful” and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
Re: more effective actors looking for small wins, I agree that it’s not clear, but yeah, seems like we are likely to get into some reference class tennis here. “A lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate”? Maybe, but I think of like, the agriculture lobby, who just sort of quietly make friends with everybody and keep getting 11-figure subsidies every year, in a way that (I think) resulted more from gradual ratcheting than making a huge ask. “Pretty much no group– whether radical or centrist– has had tangible wins” seems wrong in light of the EU AI Act (where I think both a “radical” FLI and a bunch of non-radical orgs were probably important) and the US executive order (I’m not sure which strategy is best credited there, but I think most people would have counted the policies contained within it as “minor asks” relative to licensing, pausing, etc). But yeah I agree that there are groups along the whole spectrum that probably deserve credit.
Re: poisoning the well, again, radical-ness and being dumb/uninformed are of course separable but the bar rises the more radical you get, in part because more radical policy asks strongly correlate with more complicated procedural asks; tweaking ECRA is both non-radical and procedurally simple, creating a new agency to license training runs is both outside the DC Overton Window and very procedurally complicated.
Re: incentives, I agree that this is a good thing to track, but like, “people who oppose X are incentivized to downplay the reasons to do X” is just a fully general counterargument. Unless you’re talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a “radical” strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
I agree that the CAIS statement, Hinton leaving Google, and Bengio and Hogarth’s writing have been great. I think that these are all in a highly distinct category from proposing specific actors take specific radical actions (unless I’m misremembering the Hogarth piece). Yudkowsky’s TIME article, on the other hand, definitely counts as an Overton Window move, and I’m surprised that you think this has had net positive effects. I regularly hear “bombing datacenters” as an example of a clearly extreme policy idea, sometimes in a context that sounds like it maybe made the less-radical idea seem more reasonable, but sometimes as evidence that the “doomers” want to do crazy things and we shouldn’t listen to them, and often as evidence that they are at least socially clumsy, don’t understand how politics works, etc, which is related to the things you list as the stuff that actually poisons the well. (I’m confused about the sign of the FLI letter as we’ve discussed.)
I’m not sure optimism vs pessimism is a crux, except in very short, like, 3-year timelines. It’s true that optimists are more likely to value small wins, so I guess narrowly I agree that a ratchet strategy looks strictly better for optimists, but if you think big radical changes are needed, the question remains of whether you’re more likely to get there via asking for the radical change now or looking for smaller wins to build on over time. If there simply isn’t time to build on these wins, then yes, better to take a 2% shot at the policy that you actually think will work; but even in 5-year timelines I think you’re better positioned to get what you ultimately want by 2029 if you get a little bit of what you want in 2024 and 2026 (ideally while other groups also make clear cases for the threat models and develop the policy asks, etc.). Another piece this overlooks is the information and infrastructure built by the minor policy changes. A big part of the argument for the reporting requirements in the EO was that there is now going to be an office in the US government that is in the business of collecting critical information about frontier AI models and figuring out how to synthesize it to the rest of government, that has the legal authority to do this, and both the office and the legal authority can now be expanded rather than created, and there will now be lots of individuals who are experienced in dealing with this information in the government context, and it will seem natural that the government should know this information. I think if we had only been developing and advocating for ideal policy, this would not have happened (though I imagine that this is not in fact what you’re suggesting the community do!).
Unless you’re talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a “radical” strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
It’s not just that problem though, they will likely be biased to think that their policy is helpful for safety of AI at all, and this is a point that sometimes gets forgotten.
But correct on the fact that Akash’s argument is fully general.
Ingroup losing status? Few things are more prone to distorted perception than that.
And I think this makes sense (e.g. Simler’s Social Status: Down the Rabbit Hole which you’ve probably read), if you define “AI Safety” as “people who think that superintelligence is serious business or will be some day”.
The psych dynamic that I find helpful to point out here is Yud’s Is That Your True Rejection post from ~16 years ago. A person who hears about superintelligence for the first time will often respond to their double-take at the concept by spamming random justifications for why that’s not a problem (which, notably, feels like legitimate reasoning to that person, even though it’s not). An AI-safety-minded person becomes wary of being effectively attacked by high-status people immediately turning into what is basically a weaponized justification machine, and develops a deep drive wanting that not to happen. Then justifications ensue for wanting that to happen less frequently in the world, because deep down humans really don’t want their social status to be put at risk (via denunciation) on a regular basis like that. These sorts of deep drives are pretty opaque to us humans but their real world consequences are very strong.
Something that seems more helpful than playing whack-a-mole whenever this issue comes up is having more people in AI policy putting more time into improving perspective. I don’t see shorter paths to increasing the number of people-prepared-to-handle-unexpected-complexity than giving people a broader and more general thinking capacity for thoughtfully reacting to the sorts of complex curveballs that you get in the real world. Rationalist fiction like HPMOR is great for this, as well as others e.g. Three Worlds Collide, Unsong, Worth the Candle, Worm (list of top rated ones here). With the caveat, of course, that doing well in the real world is less like the bite-sized easy-to-understand events in ratfic, and more like spotting errors in the methodology section of a study or making money playing poker.
I think, given the circumstances, it’s plausibly very valuable e.g. for people already spending much of their free time on social media or watching stuff like The Office, Garfield reruns, WWI and Cold War documentaries, etc, to only spend ~90% as much time doing that and refocusing ~10% to ratfic instead, and maybe see if they can find it in themselves to want to shift more of their leisure time to that sort of passive/ambient/automatic self-improvement productivity.
I’m not a decel, but the way this stuff often is resolved is that there are crazy people that aren’t taken seriously by the managerial class but that are very loud and make obnoxious asks. Think the evangelicals against abortion or the Columbia protestors.
Then there is some elite, part of the managerial class, that makes reasonable policy claims. For Abortion, this is Mitch McConnel, being disciplined over a long period of time in choosing the correct judges. For Palestine, this is Blinken and his State Department bureaucracy.
The problem with decels is that theoretically they are part of the managerial class themselves. Or at least, they act like they are. They call themselves rationalists, read Eliezer and Scott Alexander, and what not. But the problem is that it’s very hard for an uninterested third party to take seriously these Overton Window bogous claims from people that were supposed to be measured, part of the managerial class.
You need to split. There are the crazy ones that people don’t take seriously, but will move the managerial class. And there are the serious people that EA Money will send to D.C. to work at Blumenthal’s office. This person needs to make small policy requests that will sabotage IA, without looking so. And slowly, you get policy wins and you can sabotage the whole effort.
[reposting from Twitter, lightly edited/reformatted] Sometimes I think the whole policy framework for reducing catastrophic risks from AI boils down to two core requirements—transparency and security—for models capable of dramatically accelerating R&D.
If you have a model that could lead to general capabilities much stronger than human-level within, say, 12 months, by significantly improving subsequent training runs, the public and scientific community have a right to know this exists and to see at least a redacted safety case; and external researchers need to have some degree of red-teaming access. Probably various other forms of transparency would be useful too. It feels like this is a category of ask that should unite the “safety,” “ethics,” and “accelerationist” communities?
And the flip side is that it’s very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don’t wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.
These have some tradeoffs, especially as you approach AGI—e.g. if you develop a system that can do 99% of foundation model training tasks and your security is terrible you do have some good reasons not to immediately announce it—but not if we make progress on either of these before then, IMO. What the Pareto Frontier of transparency and security looks like, and where we should land on that curve, seems like a very important research agenda.
And the flip side is that it’s very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don’t wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.
Is it? My sense is the race dynamics get worse if you are worried that your competitor has access to a potentially pivotal model but you can’t verify that because you can’t steal it. My guess is the best equilibrium is major nations being able to access competing models.
Also, at least given present compute requirements, a smaller actor stealing a model is not that dangerous, since you need to invest hundreds of millions into compute to use the model for dangerous actions, which is hard to do secretly (though to what degree dangerous inference will cost a lot is something I am quite confused about).
In general I am not super confident here, but I at least really don’t know what the sign of hardening models against exfiltration with regards to race dynamics is.
My sense is the race dynamics get worse if you are worried that your competitor has access to a potentially pivotal model but you can’t verify that because you can’t steal it. My guess is the best equilibrium is major nations being able to access competing models.
What about limited API access to all actors for verification (aka transparency) while still having security?
It’s really hard to know that your other party is giving you API access to their most powerful model. If you could somehow verify that the API you are accessing is indeed directly hooked up to their most powerful model, and that the capabilities of that model aren’t being intentionally hobbled to deceive you, then I do think this gets you a lot of the same benefit.
Some of the benefit is still missing though. I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately, and so you might end up with substantial timeline-accelerating effects by enabling better moats.
I do think the lack-of-moat benefit is smaller than the verification benefit.
I think it should be possible to get a good enough verification regime in practice with considerable effort. It’s possible that sufficiently good verification occurs by default due to spies.
I agree it there will potentially be a lot of issues downstream of verification issues by default.
I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately
Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”, but is instead “when will you have extreme security”.
(My response might overlap with tlevin’s, I’m not super sure.)
Here’s an example way things could go:
An AI lab develops a model that begins to accelerate AI R&D substantially (say 10x) while having weak security. This model was developed primarily for commercial reasons and the possibility of it being stolen isn’t a substantial disincentive in practice.
This model is immediately stolen by China.
Shortly after this, USG secures the AI lab.
Now, further AIs will be secure, but to stay ahead of China which has substantially accelerated AI R&D and other AI work, USG races to AIs which are much smarter than humans.
In this scenario, if you had extreme security ready to go earlier, then the US would potentially have a larger lead and better negotiating position. I think this probably gets you longer delays prior to qualitatively wildly superhuman AIs in practice.
There is a case that if you don’t work on extreme security in advance, then there will naturally be a pause to implement this. I’m a bit skeptical of this in practice, especially in short timelines. I also think that the timing of this pause might not be ideal—you’d like to pause when you already have transformative AI rather than before.
Separately, if you imagine that USG is rational and at least somewhat aligned, then I think security looks quite good, though I can understand why you wouldn’t buy this.
Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”
Interesting, I guess my model is that the default outcome (in the absence of heroic efforts to the contrary) is indeed “no security for nation state attackers”, which as far as I can tell is currently the default for practically everything that is developed using modern computing systems. Getting to a point where you can protect something like the weights of an AI model from nation state actors would be extraordinarily difficult and an unprecedented achievement in computer security, which is why I don’t expect it to happen (even as many actors would really want it to happen).
My model of cybersecurity is extremely offense-dominated for anything that requires internet access or requires thousands of people to have access (both of which I think are quite likely for deployed weights).
The “how do we know if this is the most powerful model” issue is one reason I’m excited by OpenMined, who I think are working on this among other features of external access tools
Interesting. I would have to think harder about whether this is a tractable problem. My gut says it’s pretty hard to build confidence here without leaking information, but I might be wrong.
If probability of misalignment is low, probability of human+AI coups (including e.g. countries invading each other) is high, and/or there aren’t huge offense-dominant advantages to being somewhat ahead, you probably want more AGI projects, not fewer. And if you need a ton of compute to go from an AI that can do 99% of AI R&D tasks to an AI that can cause global catastrophe, then model theft is less of a factor. But the thing I’m worried about re: model theft is a scenario like this, which doesn’t seem that crazy:
Company/country X has an AI agent that can do 99% [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
The weights for Agent-GPT-7 are available (legally or illegally) to company/country Y, which is known to company/country X.
Y has, say, a fifth of the compute. So each of those steps will take 20 months. Symmetrically, company/country Y thinks it’ll take 10-40 months and company/country X thinks it’s 5-80.
Once superintelligence is in sight like this, both company/country X and Y become very scared of the other getting it first—in the country case, they are worried it will undermine nuclear deterrence, upend their political system, basically lead to getting taken over by the other. The relevant decisionmakers think this outcome is better than extinction, but maybe not by that much, whereas getting superintelligence before the other side is way better. In the company case, it’s a lot less intense, but they still would much rather get superintelligence than their arch-rival CEO.
So, X thinks they have anywhere from 5-80 months before Y has superintelligence, and Y thinks they have 1-16 months. So X and Y both think it’s easily possible, well within their 80% CI, that Y beats X.
X and Y have no reliable means of verifying a commitment like “we will spend half our compute on safety testing and alignment research.”
If these weights were not available, Y would have a similarly good system in 18 months, 80% CI 12-24.
So, had the weights not been available to Y, X would be confident that it had 12 + 5 months to manage a capabilities explosion that would have happened in 8 months at full speed; it can spend >half of its compute on alignment/safety/etc, and it has 17 rather than 5 months of serial time to negotiate with Y, possibly develop some verification methods and credible mechanisms for benefit/power-sharing, etc. If various transparency reforms have been implemented, such that the world is notified in ~real-time that this is happening, there would be enormous pressure to do so; I hope and think it will seem super illegitimate to pursue this kind of power without these kinds of commitments. I am much more worried about X not doing this and instead just trying to grab enormous amounts of power if they’re doing it all in secret.
[Also: I just accidentally went back a page by command-open bracket in an attempt to get my text out of bullet format and briefly thought I lost this comment; thank you in your LW dev capacity for autosave draft text, but also it is weirdly hard to get out of bullets]
I expect that having a nearly-AGI-level AI, something capable of mostly automating further ML research, means the ability to rapidly find algorithmic improvements that result in:
1. drastic reductions in training cost for an equivalently strong AI. - Making it seem highly likely that a new AI trained using this new architecture/method and a similar amount of compute as the current AI would be substantially more powerful. (thus giving an estimate of time-to-AGI)
- Making it possible to train a much smaller cheaper model than the current AI with the same capabilities.
2. speed-ups and compute-efficiency for inference on current AI, and for the future cheaper versions
3. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially shift military power when deployed to existing military hardware (e.g. better drone piloting models)
4. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially increase economic productivity of the receiving factories.
5. ability to rapidly innovate in non-ML technology, and thereby achieve military and economic benefits.
6. ability to create and destroy self-replicating weapons which would kill most of humanity (e.g. bioweapons), and also to create targeted ones which would wipe out just the population of a specific country.
If I were the government of a country in whom such a tech were being developed, I would really not other countries able to steal this tech. It would not seem like a worthwhile trade-off that the thieves would then have a more accurate estimate of how far from AGI my countries’ company was.
Company/country X has an AI agent that can do 99% [edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)
I strongly disagree, habryka, on the basis that I believe LLMs are already providing some uplift for highly harmful offense-dominant technology (e.g. bioweapons). I think this effect worsens the closer you get to full AGI. The inference cost to do this, even with a large model, is trivial. You just need to extract the recipe.
This gives a weak state-actor (or wealthy non-state-actor) that has high willingness to undertake provocative actions the ability to gain great power from even temporary access to a small amount of inference from a powerful model. Once they have the weapon recipe, they no longer need the model.
I’m also not sure about tlevin’s argument about ‘right to know’. I think the State has a responsibility to protect its citizens. So I certainly agree the State should be monitoring closely all the AI companies within its purview. On the other hand, making details of the progress of the AI publicly known may lead to increased international tensions or risk of theft or terrorism. I suspect it’s better that the State have inspectors and security personnel permanently posted in the AI labs, but that the exact status of the AI progress be classified.
I think the costs of biorisks are vastly smaller than AGI-extinction risk, and so they don’t really factor into my calculations here. Having intermediate harms before AGI seems somewhat good, since it seems more likely to cause rallying around stopping AGI development, though I feel pretty confused about the secondary effects here (but am pretty confident the primary effects are relatively unimportant).
I think that doesn’t really make sense, since the lowest hanging fruit for disempowering humanity routes through self-replicating weapons. Bio weapons are the currently available technology which is in the category of self-replicating weapons. I think that would be the most likely attack vector for a rogue AGI seeking rapid coercive disempowerment.
Plus, having bad actors (human or AGI) have access to a tech for which we currently have no practical defense, which could wipe out nearly all of humanity for under $100k… seems bad? Just a really unstable situation to be in?
I do agree that it seems unlikely that some terrorist org is going to launch a civilization-ending bioweapon attack within the remaining 36 months or so until AGI (or maybe even ASI). But I do think that manipulating a terrorist org into doing this, and giving them the recipe and supplies to do so, would be a potentially tempting tactic for a hostile AGI.
I think if AI kills us all it would be because the AI wants to kill us all. It is (in my model of the world) very unlikely to happen because someone misuses AI systems.
I agree that bioweapons might be part of that, but the difficult part of actually killing everyone via bioweapons requires extensive planning and deployment strategies, which humans won’t want to execute (since they don’t want to die), and so if bioweapons are involved in all of us dying it will very likely be the result of an AI seeing using them as an opportunity to take over, which I think is unlikely to happen because someone runs some leaked weights on some small amount of compute (or like, that would happen years after the same AIs would have done the same when run on the world’s largest computing clusters).
In general, for any story of “dumb AI kills everyone” you need a story for why a smart AI hasn’t killed us first.
I think if AI kills us all it would be because the AI wants to kill us all. It is (in my model of the world) very unlikely to happen because someone misuses AI systems.
I agree that it seems more likely to be a danger from AI systems misusing humans than humans misusing the AI systems.
What I don’t agree with is jumping forward in time to thinking about when there is an AI so powerful it can kill us all at its whim. In my framework, that isn’t a useful time to be thinking about, it’s too late for us to be changing the outcome at that point.
The key time to be focusing on is the time before the AI is sufficiently powerful to wipe out all of humanity, and there is nothing we can do to stop it.
My expectation is that this period of time could be months or even several years, where there is an AI powerful enough and agentic enough to make a dangerous-but-stoppable attempt to take over the world. That’s a critical moment for potential success, since potentially the AI will be contained in such a way that the threat will be objectively demonstrable to key decision makers. That would make for a window of opportunity to make sweeping governance changes, and further delay take-over. Such a delay could be super valuable if it gives alignment research more critical time for researching the dangerously powerful AI.
Also, the period of time between now and when the AI is that powerful is one where AI-as-a-tool makes it easier and easier for humans aided by AI to deploy civilization-destroying self-replicating weapons. Current AIs are already providing non-zero uplift (both lowering barriers to access, and raising peak potential harms). This is likely to continue to rapidly get worse over the next couple years. Delaying AGI doesn’t much help with biorisk from tool AI, so if you have a ‘delay AGI’ plan then you need to also consider the rapidly increasing risk from offense-dominant tech.
Same reason as knowing how many nukes your opponents has reduces racing. If you are conservative the uncertainty in how far ahead your opponent is causes escalating races, even if you would both rather not escalate (as long as your mean is well-calibrated).
E.g. let’s assume you and your opponent are de-facto equally matched in the capabilities of your system, but both have substantial uncertainty, e.g. assign 30% probability to your opponent being substantially ahead of you. Then if you think those 30% of worlds are really bad, you probably will invest a bunch more into developing your systems (which of course your opponent will observe, increase their own investment, and then you repeat).
However, if you can both verify how many nukes you have, you can reach a more stable equilibrium even under more conservative assumptions.
Gotcha. A few disanalogies though—the first two specifically relate to the model theft/shared access point, the latter is true even if you had verifiable API access:
Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
I do also currently think that the compute-component will likely be a bigger deal than the algorithmic/weights dimension, making the situation more analogous to nukes, but I do think there is a lot of uncertainty on this dimension.
Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
Yeah, totally agree that this is an argument against proliferation, and an important one. While you might not end up with additional racing dynamics, the fact that more global resources can now use the cutting edge AI system to do AI R&D is very scary.
This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
Importantly though, once you have several thousand nukes the strategic returns to more nukes drop pretty close to zero, regardless of how many your opponents have, while if you get the scary model’s weights and then don’t use them to push capabilities even more, your opponent maybe gets a huge strategic advantage over you. I think this is probably true, but the important thing is whether the actors think it might be true.
In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
I’m confident enough in this take to write it as a PSA: playing music at medium-size-or-larger gatherings is a Chesterton’s Fence situation.
It serves the very important function of reducing average conversation size: the louder the music, the more groups naturally split into smaller groups, as people on the far end develop a (usually unconscious) common knowledge that it’s too much effort to keep participating in the big one and they can start a new conversation without being unduly disruptive.
If you’ve ever been at a party with no music where people gravitate towards a single (or handful of) group of 8+ people, you’ve experienced the failure mode that this solves: usually these conversations are then actually conversations of 2-3 people with 5-6 observers, which is usually unpleasant for the observers and does not facilitate close interactions that easily lead to getting to know people.
By making it hard to have bigger conversations, the music naturally produces smaller ones; you can modulate the volume to have the desired effect on a typical discussion size. Quiet music (e.g. at many dinner parties) makes it hard to have conversations bigger than ~4-5, which is already a big improvement. Medium-volume music (think many bars) facilitates easy conversations of 2-3. The extreme end of this is dance clubs, where very loud music (not coincidentally!) makes it impossible to maintain conversations bigger than 2.
I suspect that high-decoupler hosts are just not in the habit of thinking “it’s a party, therefore I should put music on,” or even actively think “music makes it harder to talk and hear each other, and after all isn’t that the point of a party?” But it’s a very well-established cultural practice to play music at large gatherings, so, per Chesterton’s Fence, you need to understand what function it plays. The function it plays is to stop the party-destroying phenomenon of big group conversations.
As having gone to Lighthaven, does this still feel marginally worth it at Lighthaven where we mostly tried to make it architecturally difficult to have larger conversations? I can see the case for music here, but like, I do think music makes it harder to talk to people (especially on the louder end) and that does seem like a substantial cost to me.
Talking 1-1 with music is so difficult to me that I don’t enjoy a place if there’s music. I expect many people on/towards the spectrum could be similar.
Having been at two LH parties, one with music and one without, I definitely ended up in the “large conversation with 2 people talking and 5 people listening”-situation much more in the party without music.
That said, I did find it much easier to meet new people at the party without music, as this also makes it much easier to join conversations that sound interesting when you walk past (being able to actually overhear them).
This might be one of the reasons why people tend to progressively increase the volume of the music during parties. First give people a chance to meet interesting people and easily join conversations. Then increase the volume to facilitate smaller conversations.
Yeah, when there’s loud music it’s much easier for me to understand people I know than people I don’t because I’m already used to their speaking patterns, and can more easily infer what they said even when I don’t hear it perfectly. And also because any misunderstanding or difficulty that rises out of not hearing each other well is less awkward with someone I already know than someone I do.
As someone who’s spent meaningful amounts of time at LH during parties, absolutely yes. You successfully made it architecturally awkward to have large conversations, but that’s often cashed out as “there’s a giant conversation group in and totally blocking [the Entry Hallway Room of Aumann]/[the lawn between A&B]/[one or another firepit and its surrounding walkways]; that conversation group is suffering from the obvious described failure modes, but no one in it is sufficiently confident or agentic or charismatic to successfully break out into a subgroup/subconversation.
I’d recommend quiet music during parties? Or maybe even just a soundtrack of natural noises—birdsong and wind? rain and thunder? - to serve the purpose instead.
@habryka Forgot to comment on the changes you implemented for soundscape at LH during the mixer—possibly you may want to put a speaker in the Bayes window overlooking the courtyard firepit. People started congregating/pooling there (and notably not at the other firepit next to it!) because it was the locally-quietest location, and then the usual failure modes of an attempted 12-person conversation ensued.
Seems cheap to get the info value, especially for quieter music? Can be expensive to set up a multi-room sound system, but it’s probably most valuable in the room that is largest/most prone to large group formation, so maybe worth experimenting with a speaker playing some instrumental jazz or something. I do think the architecture does a fair bit of work already.
I’m being slightly off-topic here, but how does one “makes it architecturally difficult to have larger conversations”? More broadly, the topic of designing spaces where people can think better/do cooler stuff/etc. is fascinating, but I don’t know where to learn more than the very basics of it. Do you know good books, articles, etc. on these questions, by any chance?
I like Christopher Alexander’s stuff.
On the object level question, the way to encourage small conversations architecturally is to have lots of nooks that only fit 3-6 people.
“Nook”, a word which here includes both “circles of seats with no other easily movable seats nearby” and “easily accessible small rooms”.
Thanks! I knew of Alexander, but you reminded me that I’ve been procrastinating on tackling the 1,200+ pages of A Pattern Language for a few months, and I’ve now started reading it :-)
Was one giant cluster last two times I was there. In the outside area. Not sure why the physical space arrangement wasn’t working. I guess walking into a cubby feels risky/imposing, and leaving feels rude. I would have liked it to work.
I’m not sure how you could improve it. I was trying to think of something last time I was there. “Damn all these nice cubbies are empty.” I could not think of anything.
Just my experience.
I agree music has this effect, but I think the Fence is mostly because it also hugely influences the mood of the gathering, i.e. of the type and correlatedness of people’s emotional states.
(Music also has some costs, although I think most of these aren’t actually due to the music itself and can be avoided with proper acoustical treatment. E.g. people sometimes perceive music as too loud because the emitted volume is literally too high, but ime people often say this when the noise is actually overwhelming for other reasons, like echo (insofar as walls/floor/ceiling are near/hard/parallel), or bass traps/standing waves (such that the peak amplitude of the perceived wave is above the painfully loud limit, even though the average amplitude is fine; in the worst cases, this can result in barely being able to hear the music while simultaneously perceiving it as painfully loud!)
Other factors also to consider:
1.
Gatherings with generous alcohol drinking tend to have louder music because alcohol relaxes the inner ear muscles, resulting in less vibration being conveyed, resulting in sound dampening. So anyone drinking alcohol experiences lower sound volumes. This means that a comfortable volume for a drunk person is quite a bit higher than for a sober person. Which is a fact that can be quite unpleasant if you are the designated driver! I always try to remember to bring earplugs if I’m going to be a designated driver for a group going out drinking.
If you are drinking less than the average amount of alcohol at a social gathering, chances are your opinion of the music will be that it is too loud.
2. The intent of the social gathering in some cases is to facilitate good conversations. In such a case the person managing the music (host or DJ) should be thoughtful of this, and aim for a ‘coffee shop’ vibe with quiet background music and places to go in the venue where the music dwindles away.
In the alternate case, where the intent of the party is to facilitate social connection and/or flirtation and/or fun dancing… then the host / DJ may be actively pushing the music loud to discourage any but the most minimal conversation, trying to get people to drink alcohol and dance rather than talk, and at most have brief simple 1-1 conversations. A dance club is an example of a place deliberately aiming for this end of the spectrum.
So, in designing a social gathering, these factors are definitely something to keep in mind. What are the goals of the gathering? How much, if any, alcohol will the guests be drinking? If you have put someone in charge of controlling the music, are they on the same page about this? Or are they someone who is used to controlling music in a way appropriate to dance hall style scenarios and will default to that?
In regards to intellectual discussion focused gatherings, I do actually think that there can be a place for gatherings of people in which only a small subset of people talk… but I agree this shouldn’t be the default. The scenario where I think this makes sense is something more like a debate club or mini lecture with people taking turns to ask questions or challenge assumptions of the lecturer. This is less a social gathering and more an educational type experience, but can certainly be something on the borderlands between coffeeshop-style small group conversation and formal academic setting. Rousing debates and speeches or mini lectures around topics that the group finds interesting, relevant, and important can be both an educational experience and a fun social experience to perform or watch. I think this is something that needs more planning and structure to go well, and which people should be aware is intended and what rules the audience will be expected to follow in regards to interruptions, etc.
Wow, I had no idea about the effects of alcohol on hearing! It makes so much sense—I never drink and I hate how loud the music is in parties!
Also important to notice that restaurants and bars are not fully aligned with your goals. On one hand, if you feel good there, you are likely to come again, and thus generate more profit for them—this part is win/win. On the other hand, it is better for them if you spend less time talking (even if that’s what you like), and instead eat and drink more, and then leave, so that other paying customers can come—that part is win/lose.
(Could restaurants become better aligned if instead of food we paid them for time? I suspect this would result in other kind of frustrating actions, such as them taking too much time to bring the food in very small portions.)
So while it is true that the music serves a socially useful purpose, it also serves a profit-increasing purpose, so I suspect that the usual volume of music we are used to is much higher than would be socially optimal.
I also like Lorxus’s proposal of playing natural noises instead.
The “anti-café” concept is like this. I’ve never been to one myself, but I’ve seen descriptions on the Web of a few of them existing. They don’t provide anything like restaurant-style service that I’ve heard; instead, there are often cheap or free snacks along the lines of what a office break room might carry, along with other amenities, and you pay for the amount of time you spend there.
I think a restaurant where you paid for time, if the food was nothing special, would quickly turn into a coworking space. Maybe it would be more open-office and more amenable to creative, conversational, interpersonal work rather than laptop work. You probably want it to be a cafe—or at least look like a cafe from the outside in signage / branding; you may want architectural sound dampening like a denny’s booth. You could sell pre-packaged food and sodas—it isn’t what they’re here for. Or you could even sell or rent activities like coloring books, simple social tabletop games, small toys, lockpicking practice locks, tiny marshmallow candle smore sets, and so on.
Unfortunately different people have different levels of hearing ability, so you’re not setting the conversation size at the same level for all participants. If you set the volume too high, you may well be excluding some people from the space entirely.
I think that people mostly put music on in these settings as a way to avoid awkward silences and to create the impression that the room is more active than it is, whilst people are arriving. If this is true, then it serves no great purpose once people have arrived and are engaged in conversation.
Another important consideration is sound-damping. I’ve been in venues where there’s no music playing and the conversations are happening between 3 −5 people but everyone is shouting to be heard above the crowd, and it’s incredibly difficult for someone with hearing damage to participate at all. This is primarily a result of hard, echoey walls and very few soft furnishings.
I think there’s something to be said for having different areas with different noise levels, allowing people to choose what they’re comfortable with, and observing where they go.
It seems to me that this claim has a lot to overcome, given that the observers could walk away at any time.
Is that a goal? I’ve never been much of a partygoer, but if I want to have a one-on-one conversation with somebody and get to know them, a party is about the last place I’d think about going. Too many annoying interruptions.
It may do that, but that doesn’t necessarily mean that that’s the function. You could equally well guess that its function was to exclude people who don’t like loud music, since it also does that.
this is an incredible insight! from this I think we can design better nightclublike social spaces for people who don’t like loud sounds (such as people in this community with signal processing issues due to autism).
One idea I have is to do it in the digital. like, VR chat silent nightclub where the sound falloff is super high. (perhaps this exists?) Or a 2D top down equivalent. I will note that Gather Town is backwards—the sound radius is so large that there is still lots of lemurs, but at the same time you can’t read people’s body language from across the room—and instead there needs to be an emotive radius from webcam / face-tracking needs to be larger than the sound radius. Or you can have a trad UI with “rooms” of very small size that you have to join to talk. tricky to get that kind of app right though since irl there’s a fluid boundary between in and out of a convo and a binary demarcation would be subtly unpleasant.
Another idea is to find alternative ways to sound isolate in meatspace. Other people have talked about architectural approaches like in Lighthaven. Or imagine a party where everyone had to wear earplugs. sound falls off with the square of distance and you can calculate out how many decibles you need to deafen everyone by to get the group sizes you want. Or a party with a rule that you have to plug your ears when you aren’t actively in a conversation.
Or you could lay out some hula hoops with space between them and the rule is you can only talk within the hula hoop with other people in it, and you can’t listen in on someone else’s hula hoop convo. have to plug your ears as you walk around. Better get real comfortable with your friends! Maybe secretly you can move the hoops around to combine into bigger groups if you are really motivated. Or with way more effort, you could similarly do a bed fort building competition.
These are very cheap experiments!
I think some of the AI safety policy community has over-indexed on the visual model of the “Overton Window” and under-indexed on alternatives like the “ratchet effect,” “poisoning the well,” “clown attacks,” and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).
I’m not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of “Overton Window-moving” strategies executed in practice have larger negative effects via associating their “side” with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.
In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea “outside the window” and this actually makes the window narrower. But I think the visual imagery of “windows” actually struggles to accommodate this—when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.
Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).
These are plausible concerns, but I don’t think they match what I see as a longtime DC person.
We know that the legislative branch is less productive in the US than it has been in any modern period, and fewer bills get passed (many different metrics for this, but one is https://www.reuters.com/graphics/USA-CONGRESS/PRODUCTIVITY/egpbabmkwvq/) . Those bills that do get passed tend to be bigger swings as a result—either a) transformative legislation (e.g., Obamacare, Trump tax cuts and COVID super-relief, Biden Inflation Reduction Act and CHIPS) or b) big omnibus “must-pass” bills like FAA reauthorization, into which many small proposals get added in.
I also disagree with the claim that policymakers focus on credibility and consensus generally, except perhaps in the executive branch to some degree. (You want many executive actions to be noncontroversial “faithfully executing the laws” stuff, but I don’t see that as “policymaking” in the sense you describe it.)
In either of those, it seems like the current legislative “meta” favors bigger policy asks, not small wins, and I’m having trouble of thinking of anyone I know who’s impactful in DC who has adopted the opposite strategy. What are examples of the small wins that you’re thinking of as being the current meta?
Agree with lots of this– a few misc thoughts [hastily written]:
I think the Overton Window frame ends up getting people to focus too much on the dimension “how radical is my ask”– in practice, things are usually much more complicated than this. In my opinion, a preferable frame is something like “who is my target audience and what might they find helpful.” If you’re talking to someone who makes it clear that they will not support X, it’s silly to keep on talking about X. But I think the “target audience first” approach ends up helping people reason in a more sophisticated way about what kinds of ideas are worth bringing up. As an example, in my experience so far, many policymakers are curious to learn more about intelligence explosion scenarios and misalignment scenarios (the more “radical” and “speculative” threat models).
I don’t think it’s clear that the more effective actors in DC tend to be those who look for small wins. Outside of the AIS community, there sure do seem to be a lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate. Whether or not these organizations end up having more or less influence than the more “centrist” groups is, in my view, not a settled question & probably varies a lot by domain. In AI safety in particular, I think my main claim is something like “pretty much no group– whether radical or centrist– has had tangible wins. When I look at the small set of tangible wins, it seems like the groups involved were across the spectrum of “reasonableness.”
The more I interact with policymakers, the more I’m updating toward something like “poisoning the well doesn’t come from having radical beliefs– poisoning the well comes from lamer things like being dumb or uninformed, wasting peoples’ time, not understanding how the political process works, not having tangible things you want someone to do, explaining ideas poorly, being rude or disrespectful, etc.” I’ve asked ~20-40 policymakers (outside of the AIS bubble) things like “what sorts of things annoy you about meetings” or “what tends to make meetings feel like a waste of your time”, and no one ever says “people come in with ideas that are too radical.” The closest thing I’ve heard is people saying that they dislike it when groups fail to understand why things aren’t able to happen (like, someone comes in thinking their idea is great, but then they fail to understand that their idea needs approval from committee A and appropriations person B and then they’re upset about why things are moving slowly). It seems to me like many policy folks (especially staffers and exec branch subject experts) are genuinely interested in learning more about the beliefs and worldviews that have been prematurely labeled as “radical” or “unreasonable” (or perhaps such labels were appropriate before chatGPT but no longer are).
A reminder that those who are opposed to regulation have strong incentives to make it seem like basically-any-regulation is radical/unreasonable. An extremely common tactic is for industry and its allies to make common-sense regulation seem radical/crazy/authoritarian & argue that actually the people proposing strong policies are just making everyone look bad & argue that actually we should all rally behind [insert thing that isn’t a real policy.] (I admit this argument is a bit general, and indeed I’ve made it before, so I won’t harp on it here. Also I don’t think this is what Trevor is doing– it is indeed possible to raise serious discussions about “poisoning the well” even if one believes that the cultural and economic incentives disproportionately elevate such points).
In the context of AI safety, it seems to me like the most high-influence Overton Window moves have been positive– and in fact I would go as far as to say strongly positive. Examples that come to mind include the CAIS statement, FLI pause letter, Hinton leaving Google, Bengio’s writings/speeches about rogue AI & loss of control, Ian Hogarth’s piece about the race to god-like AI, and even Yudkowsky’s TIME article.
I think some of our judgments here depend on underlying threat models and an underlying sense of optimism vs. pessimism. If one things that labs making voluntary agreements/promises and NIST contributing to the development of voluntary standards are quite excellent ways to reduce AI risk, then the groups that have helped make this happen deserve a lot of credit. If one thinks that much more is needed to meaningfully reduce xrisk, then the groups that are raising awareness about the nature of the problem, making high-quality arguments about threat models, and advocating for stronger policies deserve a lot of credit.
I agree that more research on this could be useful. But I think it would be most valuable to focus less on “is X in the Overton Window” and more on “is X written/explained well and does it seem to have clear implications for the target stakeholders?”
Quick reactions:
Re: how over-emphasis on “how radical is my ask” vs “what my target audience might find helpful” and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
Re: more effective actors looking for small wins, I agree that it’s not clear, but yeah, seems like we are likely to get into some reference class tennis here. “A lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate”? Maybe, but I think of like, the agriculture lobby, who just sort of quietly make friends with everybody and keep getting 11-figure subsidies every year, in a way that (I think) resulted more from gradual ratcheting than making a huge ask. “Pretty much no group– whether radical or centrist– has had tangible wins” seems wrong in light of the EU AI Act (where I think both a “radical” FLI and a bunch of non-radical orgs were probably important) and the US executive order (I’m not sure which strategy is best credited there, but I think most people would have counted the policies contained within it as “minor asks” relative to licensing, pausing, etc). But yeah I agree that there are groups along the whole spectrum that probably deserve credit.
Re: poisoning the well, again, radical-ness and being dumb/uninformed are of course separable but the bar rises the more radical you get, in part because more radical policy asks strongly correlate with more complicated procedural asks; tweaking ECRA is both non-radical and procedurally simple, creating a new agency to license training runs is both outside the DC Overton Window and very procedurally complicated.
Re: incentives, I agree that this is a good thing to track, but like, “people who oppose X are incentivized to downplay the reasons to do X” is just a fully general counterargument. Unless you’re talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a “radical” strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
I agree that the CAIS statement, Hinton leaving Google, and Bengio and Hogarth’s writing have been great. I think that these are all in a highly distinct category from proposing specific actors take specific radical actions (unless I’m misremembering the Hogarth piece). Yudkowsky’s TIME article, on the other hand, definitely counts as an Overton Window move, and I’m surprised that you think this has had net positive effects. I regularly hear “bombing datacenters” as an example of a clearly extreme policy idea, sometimes in a context that sounds like it maybe made the less-radical idea seem more reasonable, but sometimes as evidence that the “doomers” want to do crazy things and we shouldn’t listen to them, and often as evidence that they are at least socially clumsy, don’t understand how politics works, etc, which is related to the things you list as the stuff that actually poisons the well. (I’m confused about the sign of the FLI letter as we’ve discussed.)
I’m not sure optimism vs pessimism is a crux, except in very short, like, 3-year timelines. It’s true that optimists are more likely to value small wins, so I guess narrowly I agree that a ratchet strategy looks strictly better for optimists, but if you think big radical changes are needed, the question remains of whether you’re more likely to get there via asking for the radical change now or looking for smaller wins to build on over time. If there simply isn’t time to build on these wins, then yes, better to take a 2% shot at the policy that you actually think will work; but even in 5-year timelines I think you’re better positioned to get what you ultimately want by 2029 if you get a little bit of what you want in 2024 and 2026 (ideally while other groups also make clear cases for the threat models and develop the policy asks, etc.). Another piece this overlooks is the information and infrastructure built by the minor policy changes. A big part of the argument for the reporting requirements in the EO was that there is now going to be an office in the US government that is in the business of collecting critical information about frontier AI models and figuring out how to synthesize it to the rest of government, that has the legal authority to do this, and both the office and the legal authority can now be expanded rather than created, and there will now be lots of individuals who are experienced in dealing with this information in the government context, and it will seem natural that the government should know this information. I think if we had only been developing and advocating for ideal policy, this would not have happened (though I imagine that this is not in fact what you’re suggesting the community do!).
It’s not just that problem though, they will likely be biased to think that their policy is helpful for safety of AI at all, and this is a point that sometimes gets forgotten.
But correct on the fact that Akash’s argument is fully general.
Recently, John Wentworth wrote:
And I think this makes sense (e.g. Simler’s Social Status: Down the Rabbit Hole which you’ve probably read), if you define “AI Safety” as “people who think that superintelligence is serious business or will be some day”.
The psych dynamic that I find helpful to point out here is Yud’s Is That Your True Rejection post from ~16 years ago. A person who hears about superintelligence for the first time will often respond to their double-take at the concept by spamming random justifications for why that’s not a problem (which, notably, feels like legitimate reasoning to that person, even though it’s not). An AI-safety-minded person becomes wary of being effectively attacked by high-status people immediately turning into what is basically a weaponized justification machine, and develops a deep drive wanting that not to happen. Then justifications ensue for wanting that to happen less frequently in the world, because deep down humans really don’t want their social status to be put at risk (via denunciation) on a regular basis like that. These sorts of deep drives are pretty opaque to us humans but their real world consequences are very strong.
Something that seems more helpful than playing whack-a-mole whenever this issue comes up is having more people in AI policy putting more time into improving perspective. I don’t see shorter paths to increasing the number of people-prepared-to-handle-unexpected-complexity than giving people a broader and more general thinking capacity for thoughtfully reacting to the sorts of complex curveballs that you get in the real world. Rationalist fiction like HPMOR is great for this, as well as others e.g. Three Worlds Collide, Unsong, Worth the Candle, Worm (list of top rated ones here). With the caveat, of course, that doing well in the real world is less like the bite-sized easy-to-understand events in ratfic, and more like spotting errors in the methodology section of a study or making money playing poker.
I think, given the circumstances, it’s plausibly very valuable e.g. for people already spending much of their free time on social media or watching stuff like The Office, Garfield reruns, WWI and Cold War documentaries, etc, to only spend ~90% as much time doing that and refocusing ~10% to ratfic instead, and maybe see if they can find it in themselves to want to shift more of their leisure time to that sort of passive/ambient/automatic self-improvement productivity.
I’m not a decel, but the way this stuff often is resolved is that there are crazy people that aren’t taken seriously by the managerial class but that are very loud and make obnoxious asks. Think the evangelicals against abortion or the Columbia protestors.
Then there is some elite, part of the managerial class, that makes reasonable policy claims. For Abortion, this is Mitch McConnel, being disciplined over a long period of time in choosing the correct judges. For Palestine, this is Blinken and his State Department bureaucracy.
The problem with decels is that theoretically they are part of the managerial class themselves. Or at least, they act like they are. They call themselves rationalists, read Eliezer and Scott Alexander, and what not. But the problem is that it’s very hard for an uninterested third party to take seriously these Overton Window bogous claims from people that were supposed to be measured, part of the managerial class.
You need to split. There are the crazy ones that people don’t take seriously, but will move the managerial class. And there are the serious people that EA Money will send to D.C. to work at Blumenthal’s office. This person needs to make small policy requests that will sabotage IA, without looking so. And slowly, you get policy wins and you can sabotage the whole effort.
[reposting from Twitter, lightly edited/reformatted] Sometimes I think the whole policy framework for reducing catastrophic risks from AI boils down to two core requirements—transparency and security—for models capable of dramatically accelerating R&D.
If you have a model that could lead to general capabilities much stronger than human-level within, say, 12 months, by significantly improving subsequent training runs, the public and scientific community have a right to know this exists and to see at least a redacted safety case; and external researchers need to have some degree of red-teaming access. Probably various other forms of transparency would be useful too. It feels like this is a category of ask that should unite the “safety,” “ethics,” and “accelerationist” communities?
And the flip side is that it’s very important that a model capable of kicking off that kind of dash to superhuman capabilities not get stolen/exfiltrated, such that you don’t wind up with multiple actors facing enormous competitive tradeoffs to rush through this process.
These have some tradeoffs, especially as you approach AGI—e.g. if you develop a system that can do 99% of foundation model training tasks and your security is terrible you do have some good reasons not to immediately announce it—but not if we make progress on either of these before then, IMO. What the Pareto Frontier of transparency and security looks like, and where we should land on that curve, seems like a very important research agenda.
If you’re interested in moving the ball forward on either of these, my colleagues and I would love to see your proposal and might fund you to work on it!
Is it? My sense is the race dynamics get worse if you are worried that your competitor has access to a potentially pivotal model but you can’t verify that because you can’t steal it. My guess is the best equilibrium is major nations being able to access competing models.
Also, at least given present compute requirements, a smaller actor stealing a model is not that dangerous, since you need to invest hundreds of millions into compute to use the model for dangerous actions, which is hard to do secretly (though to what degree dangerous inference will cost a lot is something I am quite confused about).
In general I am not super confident here, but I at least really don’t know what the sign of hardening models against exfiltration with regards to race dynamics is.
What about limited API access to all actors for verification (aka transparency) while still having security?
It’s really hard to know that your other party is giving you API access to their most powerful model. If you could somehow verify that the API you are accessing is indeed directly hooked up to their most powerful model, and that the capabilities of that model aren’t being intentionally hobbled to deceive you, then I do think this gets you a lot of the same benefit.
Some of the benefit is still missing though. I think lack of moats is a strong disincentive to develop technology, and so in a race scenario you might be a lot less tempted to make a mad sprint towards AGI if you think your opponents can catch up almost immediately, and so you might end up with substantial timeline-accelerating effects by enabling better moats.
I do think the lack-of-moat benefit is smaller than the verification benefit.
I think it should be possible to get a good enough verification regime in practice with considerable effort. It’s possible that sufficiently good verification occurs by default due to spies.
I agree it there will potentially be a lot of issues downstream of verification issues by default.
Hmm, this isn’t really how I model the situation with respect to racing. From my perspective, the question isn’t “security or no security”, but is instead “when will you have extreme security”.
(My response might overlap with tlevin’s, I’m not super sure.)
Here’s an example way things could go:
An AI lab develops a model that begins to accelerate AI R&D substantially (say 10x) while having weak security. This model was developed primarily for commercial reasons and the possibility of it being stolen isn’t a substantial disincentive in practice.
This model is immediately stolen by China.
Shortly after this, USG secures the AI lab.
Now, further AIs will be secure, but to stay ahead of China which has substantially accelerated AI R&D and other AI work, USG races to AIs which are much smarter than humans.
In this scenario, if you had extreme security ready to go earlier, then the US would potentially have a larger lead and better negotiating position. I think this probably gets you longer delays prior to qualitatively wildly superhuman AIs in practice.
There is a case that if you don’t work on extreme security in advance, then there will naturally be a pause to implement this. I’m a bit skeptical of this in practice, especially in short timelines. I also think that the timing of this pause might not be ideal—you’d like to pause when you already have transformative AI rather than before.
Separately, if you imagine that USG is rational and at least somewhat aligned, then I think security looks quite good, though I can understand why you wouldn’t buy this.
Interesting, I guess my model is that the default outcome (in the absence of heroic efforts to the contrary) is indeed “no security for nation state attackers”, which as far as I can tell is currently the default for practically everything that is developed using modern computing systems. Getting to a point where you can protect something like the weights of an AI model from nation state actors would be extraordinarily difficult and an unprecedented achievement in computer security, which is why I don’t expect it to happen (even as many actors would really want it to happen).
My model of cybersecurity is extremely offense-dominated for anything that requires internet access or requires thousands of people to have access (both of which I think are quite likely for deployed weights).
The “how do we know if this is the most powerful model” issue is one reason I’m excited by OpenMined, who I think are working on this among other features of external access tools
Interesting. I would have to think harder about whether this is a tractable problem. My gut says it’s pretty hard to build confidence here without leaking information, but I might be wrong.
If probability of misalignment is low, probability of human+AI coups (including e.g. countries invading each other) is high, and/or there aren’t huge offense-dominant advantages to being somewhat ahead, you probably want more AGI projects, not fewer. And if you need a ton of compute to go from an AI that can do 99% of AI R&D tasks to an AI that can cause global catastrophe, then model theft is less of a factor. But the thing I’m worried about re: model theft is a scenario like this, which doesn’t seem that crazy:
Company/country X has an AI agent that can
do 99%[edit: let’s say “automate 90%”] of AI R&D tasks, call it Agent-GPT-7, and enough of a compute stock to have that train a significantly better Agent-GPT-8 in 4 months at full speed ahead, which can then train a basically superintelligent Agent-GPT-9 in another 4 months at full speed ahead. (Company/country X doesn’t know the exact numbers, but their 80% CI is something like 2-8 months for each step; company/country Y has less info, so their 80% CI is more like 1-16 months for each step.)The weights for Agent-GPT-7 are available (legally or illegally) to company/country Y, which is known to company/country X.
Y has, say, a fifth of the compute. So each of those steps will take 20 months. Symmetrically, company/country Y thinks it’ll take 10-40 months and company/country X thinks it’s 5-80.
Once superintelligence is in sight like this, both company/country X and Y become very scared of the other getting it first—in the country case, they are worried it will undermine nuclear deterrence, upend their political system, basically lead to getting taken over by the other. The relevant decisionmakers think this outcome is better than extinction, but maybe not by that much, whereas getting superintelligence before the other side is way better. In the company case, it’s a lot less intense, but they still would much rather get superintelligence than their arch-rival CEO.
So, X thinks they have anywhere from 5-80 months before Y has superintelligence, and Y thinks they have 1-16 months. So X and Y both think it’s easily possible, well within their 80% CI, that Y beats X.
X and Y have no reliable means of verifying a commitment like “we will spend half our compute on safety testing and alignment research.”
If these weights were not available, Y would have a similarly good system in 18 months, 80% CI 12-24.
So, had the weights not been available to Y, X would be confident that it had 12 + 5 months to manage a capabilities explosion that would have happened in 8 months at full speed; it can spend >half of its compute on alignment/safety/etc, and it has 17 rather than 5 months of serial time to negotiate with Y, possibly develop some verification methods and credible mechanisms for benefit/power-sharing, etc. If various transparency reforms have been implemented, such that the world is notified in ~real-time that this is happening, there would be enormous pressure to do so; I hope and think it will seem super illegitimate to pursue this kind of power without these kinds of commitments. I am much more worried about X not doing this and instead just trying to grab enormous amounts of power if they’re doing it all in secret.
[Also: I just accidentally went back a page by command-open bracket in an attempt to get my text out of bullet format and briefly thought I lost this comment; thank you in your LW dev capacity for autosave draft text, but also it is weirdly hard to get out of bullets]
I expect that having a nearly-AGI-level AI, something capable of mostly automating further ML research, means the ability to rapidly find algorithmic improvements that result in:
1. drastic reductions in training cost for an equivalently strong AI.
- Making it seem highly likely that a new AI trained using this new architecture/method and a similar amount of compute as the current AI would be substantially more powerful. (thus giving an estimate of time-to-AGI)
- Making it possible to train a much smaller cheaper model than the current AI with the same capabilities.
2. speed-ups and compute-efficiency for inference on current AI, and for the future cheaper versions
3. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially shift military power when deployed to existing military hardware (e.g. better drone piloting models)
4. ability to create and deploy more capable narrow tool-AIs which seem likely to substantially increase economic productivity of the receiving factories.
5. ability to rapidly innovate in non-ML technology, and thereby achieve military and economic benefits.
6. ability to create and destroy self-replicating weapons which would kill most of humanity (e.g. bioweapons), and also to create targeted ones which would wipe out just the population of a specific country.
If I were the government of a country in whom such a tech were being developed, I would really not other countries able to steal this tech. It would not seem like a worthwhile trade-off that the thieves would then have a more accurate estimate of how far from AGI my countries’ company was.
Just pressing enter twice seems to work well-enough for me, though I feel like I vaguely remember some bugged state where that didn’t work.
Yeah doing it again it works fine, but it was just creating a long list of empty bullet points (I also have this issue in GDocs sometimes)
Yeah, weird. I will see whether I can reproduce it somehow. It is quite annoying when it happens.
Spicy take: it might be more realistic to substract 1 or even 2 from the numbers for the GPT generations, and also to consider that the intelligence explosion might be quite widely-distributed: https://www.lesswrong.com/posts/wr2SxQuRvcXeDBbNZ/bogdan-ionut-cirstea-s-shortform?commentId=6EFv8PAvELkFopLHy
I strongly disagree, habryka, on the basis that I believe LLMs are already providing some uplift for highly harmful offense-dominant technology (e.g. bioweapons). I think this effect worsens the closer you get to full AGI. The inference cost to do this, even with a large model, is trivial. You just need to extract the recipe.
This gives a weak state-actor (or wealthy non-state-actor) that has high willingness to undertake provocative actions the ability to gain great power from even temporary access to a small amount of inference from a powerful model. Once they have the weapon recipe, they no longer need the model.
I’m also not sure about tlevin’s argument about ‘right to know’. I think the State has a responsibility to protect its citizens. So I certainly agree the State should be monitoring closely all the AI companies within its purview. On the other hand, making details of the progress of the AI publicly known may lead to increased international tensions or risk of theft or terrorism. I suspect it’s better that the State have inspectors and security personnel permanently posted in the AI labs, but that the exact status of the AI progress be classified.
I think the costs of biorisks are vastly smaller than AGI-extinction risk, and so they don’t really factor into my calculations here. Having intermediate harms before AGI seems somewhat good, since it seems more likely to cause rallying around stopping AGI development, though I feel pretty confused about the secondary effects here (but am pretty confident the primary effects are relatively unimportant).
I think that doesn’t really make sense, since the lowest hanging fruit for disempowering humanity routes through self-replicating weapons. Bio weapons are the currently available technology which is in the category of self-replicating weapons. I think that would be the most likely attack vector for a rogue AGI seeking rapid coercive disempowerment.
Plus, having bad actors (human or AGI) have access to a tech for which we currently have no practical defense, which could wipe out nearly all of humanity for under $100k… seems bad? Just a really unstable situation to be in?
I do agree that it seems unlikely that some terrorist org is going to launch a civilization-ending bioweapon attack within the remaining 36 months or so until AGI (or maybe even ASI). But I do think that manipulating a terrorist org into doing this, and giving them the recipe and supplies to do so, would be a potentially tempting tactic for a hostile AGI.
I think if AI kills us all it would be because the AI wants to kill us all. It is (in my model of the world) very unlikely to happen because someone misuses AI systems.
I agree that bioweapons might be part of that, but the difficult part of actually killing everyone via bioweapons requires extensive planning and deployment strategies, which humans won’t want to execute (since they don’t want to die), and so if bioweapons are involved in all of us dying it will very likely be the result of an AI seeing using them as an opportunity to take over, which I think is unlikely to happen because someone runs some leaked weights on some small amount of compute (or like, that would happen years after the same AIs would have done the same when run on the world’s largest computing clusters).
In general, for any story of “dumb AI kills everyone” you need a story for why a smart AI hasn’t killed us first.
I agree that it seems more likely to be a danger from AI systems misusing humans than humans misusing the AI systems.
What I don’t agree with is jumping forward in time to thinking about when there is an AI so powerful it can kill us all at its whim. In my framework, that isn’t a useful time to be thinking about, it’s too late for us to be changing the outcome at that point.
The key time to be focusing on is the time before the AI is sufficiently powerful to wipe out all of humanity, and there is nothing we can do to stop it.
My expectation is that this period of time could be months or even several years, where there is an AI powerful enough and agentic enough to make a dangerous-but-stoppable attempt to take over the world. That’s a critical moment for potential success, since potentially the AI will be contained in such a way that the threat will be objectively demonstrable to key decision makers. That would make for a window of opportunity to make sweeping governance changes, and further delay take-over. Such a delay could be super valuable if it gives alignment research more critical time for researching the dangerously powerful AI.
Also, the period of time between now and when the AI is that powerful is one where AI-as-a-tool makes it easier and easier for humans aided by AI to deploy civilization-destroying self-replicating weapons. Current AIs are already providing non-zero uplift (both lowering barriers to access, and raising peak potential harms). This is likely to continue to rapidly get worse over the next couple years. Delaying AGI doesn’t much help with biorisk from tool AI, so if you have a ‘delay AGI’ plan then you need to also consider the rapidly increasing risk from offense-dominant tech.
Also—I’m not sure I’m getting the thing where verifying that your competitor has a potentially pivotal model reduces racing?
Same reason as knowing how many nukes your opponents has reduces racing. If you are conservative the uncertainty in how far ahead your opponent is causes escalating races, even if you would both rather not escalate (as long as your mean is well-calibrated).
E.g. let’s assume you and your opponent are de-facto equally matched in the capabilities of your system, but both have substantial uncertainty, e.g. assign 30% probability to your opponent being substantially ahead of you. Then if you think those 30% of worlds are really bad, you probably will invest a bunch more into developing your systems (which of course your opponent will observe, increase their own investment, and then you repeat).
However, if you can both verify how many nukes you have, you can reach a more stable equilibrium even under more conservative assumptions.
Gotcha. A few disanalogies though—the first two specifically relate to the model theft/shared access point, the latter is true even if you had verifiable API access:
Me verifying how many nukes you have doesn’t mean I suddenly have that many nukes, unlike AI model capabilities, though due to compute differences it does not mean we suddenly have the same time distance to superintelligence.
Me having more nukes only weakly enables me to develop more nukes faster, unlike AI that can automate a lot of AI R&D.
This model seems to assume you have an imprecise but accurate estimate of how many nukes I have, but companies will probably be underestimating the proximity of each other to superintelligence, for the same reason that they’re underestimating their own proximity to superintelligence, until it’s way more salient/obvious.
It’s not super clear whether from a racing perspective having an equal number of nukes is bad. I think it’s genuinely messy (and depends quite sensitively on how much actors are scared of losing vs. happy about winning vs. scared of racing).
I do also currently think that the compute-component will likely be a bigger deal than the algorithmic/weights dimension, making the situation more analogous to nukes, but I do think there is a lot of uncertainty on this dimension.
Yeah, totally agree that this is an argument against proliferation, and an important one. While you might not end up with additional racing dynamics, the fact that more global resources can now use the cutting edge AI system to do AI R&D is very scary.
In-general I think it’s very hard to predict whether people will overestimate or underestimate things. I agree that literally right now countries are probably underestimating it, but an overreaction in the future also wouldn’t surprise me very much (in the same way that COVID started with an underreaction, and then was followed by a massive overreaction).
Importantly though, once you have several thousand nukes the strategic returns to more nukes drop pretty close to zero, regardless of how many your opponents have, while if you get the scary model’s weights and then don’t use them to push capabilities even more, your opponent maybe gets a huge strategic advantage over you. I think this is probably true, but the important thing is whether the actors think it might be true.
Yeah, good point.