My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there’s a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.
Some quick thoughts:
Soft power– I think people underestimate the how strong the “soft power” of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of “good judgment”– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.
With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).
Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman, and of course Jan Leike and Daniel K.
Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as “evaluation” than “criticism.”
It’s often important for evaluation to be quite truth-tracking. Criticism isn’t obviously good by default.
Edit:
3. I’m pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don’t understand your “good judgment” point. Feedback I’ve gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there’s a selection effect in what I hear, but I’m quite sure most of them support such efforts.
4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being “‘unilateralist,’ ‘not serious,’ and ‘untrustworthy.’” I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs.
5. I do agree on “soft power” and some of “jobs.” People often don’t criticize the labs publicly because they’re worried about negative effects on them, their org, or people associated with them.
Agreed— my main point here is that the marketplace of ideas undervalues criticism.
I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.
The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.
EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.
RE 3:
Good to hear that responses have been positive to lab watch. My impression is that this is a mix of: (a) lab watch doesn’t really threaten the interests of labs (especially Anthropic, which is currently winning & currently the favorite lab among senior AIS ppl), (b) the tides have been shifting somewhat and it is genuinely less taboo to criticize labs than a year ago, and (c) EAs respond more positively to criticism that feels more detailed/nuanced (look I have these 10 categories, let’s rate the labs on each dimension) than criticisms that are more about metastrategy (e.g., challenging the entire RSP frame or advocating for policymaker outreach).
RE 4: I haven’t heard anything about Conjecture that I’ve found particularly concerning. Would be interested in you clarifying (either here or via DM) what you’ve heard. (And clarification note that my original point was less “Conjecture hasn’t done anything wrong” and more “I suspect Conjecture will be more heavily scrutinized and examined and have a disproportionate amount of optimization pressure applied against it given its clear push for things that would hurt lab interests.”)
I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.
I don’t think everyone should leave labs (obviously). But I would probably hit a button that does something like “everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving.”
My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.
I think governments still remain unsure about what to do, and there’s a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.
There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is “not really sure if what they’re doing is making a big difference”, I would probably hit a button that allocates them toward government work or government-focused comms work.
Written on a Slack channel in response to discussions about some folks leaving OpenAI.
I’d be worried about evaporative cooling. It seems that the net result of this would be that labs would be almost completely devoid of people earnest about safety.
I agree with you government pathways to impact are most plausible and until recently undervalued. I also agree with you there are weird competitive pressures at labs.
I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt.
I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more “cool/fun/fast-paced”, lots of govt jobs force you to move locations, etc.)
I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there’s a stronger case for staying at OpenAI right now than for DeepMind or Anthropic.
I largely agree, but think given government hiring timelines, there’s no dishonor in staying at a lab doing moderately risk-reducing work until you get a hiring offer with an actual start date. This problem is less bad for the special hiring authorities being used for AI stuff oftentimes, but it’s still not ideal.
Here are some AI governance/policy thoughts that I’ve found myself articulating at least 3 times over the last month or so:
I think people interested in AI governance/policy should divide their projects into “things that could be useful in the current Overton Window” and “things that would require a moderate or major Overton Window shift to be useful.” I think sometimes people end up not thinking concretely about which world they’re aiming for, and this makes their work less valuable.
If you’re aiming for the current Overton Window, you need to be brutally honest about what you can actually achieve. There are many barriers to implementing sensible-seeming ideas. You need access to stakeholders who can do something. You should try to fail quickly. If your idea requires buy-in from XYZ folks, and X isn’t interested, that’s worth figuring out ASAP.
If you’re aiming for something outside the current Overton Window, you often have a lot of room to be imaginative. I think it’s very easy to underestimate Overton Window shifts. If policymakers get considerably more concerned about AI risks, there are a lot of things that will “on the table”. People say that AI safety folks were unprepared for the chatGPT surge– if you think that there will be 1-2 more surges of interest, it might be worth explicitly preparing for ideas that would be considered in those surges.
I think it’s pretty essential to be in regular touch with policymakers/staffers if your main TOC is to get things done in the current Overton Window.
A common failure mode for “research types” is to write a 20+page paperand then ask “ok cool, which policymakers might be interested?” I think usually a better strategy is to try to get in touch with your target audience much earlier on in the process. Present the 1-2 page version of your idea and see if/where the nuance is useful. (To be clear, this is if your TOC involves directly influencing policy. This doesn’t apply if your main TOC is to improve everyone’s understanding of X topic or improve your own understanding of Y topic).
On the margin, I think more people who are new to AI governance/policy should be focusing on “things that would require a moderate or major Overton Window shift to be useful.” I think there’s more low-hanging fruit there that people can contribute to without necessarily having the kinds of networks/access that you often need to know what to do in the current Overton Window.
I think people tend to underestimate how quickly they could become a world expert in a specific area. This is especially true if you’re applying it to the intersection of two areas. For example, it’s very hard to become a world expert in international governance. But it’s relatively easier to become a world expert in the intersection of “international governance” and “AI safety”. There will be people who know more about international governance than you and people who know more about AI safety than you, but you might become one of the people who has thought the most rigorously about the intersection of the two topics.
I think I agree with much-to-all of this. One further amplification I’d make about the last point: the culture of DC policymaking is one where people are expected to be quick studies and it’s OK to be new to a topic; talent is much more funged from topic to topic in response to changing priorities than you’d expect. Your Lesswrong-informed outside view of how much you need to know on a topic to start commenting on policy ideas is probably wrong.
(Yes, I know, someone is about to say “but what if you are WRONG about the big idea given weird corner case X or second-order effects Y?” Look, reversed stupidity is not wisdom, but also also sometimes you can just quickly identify stupid-across-almost-all-possible-worlds ideas and convince people just not to do them rather than having to advocate for an explicit good-idea alternative.)
Both statements are explicit about AGI risks & emphasize the importance of transparency & whistleblower mechanisms.
William’s statement acknowledges that he and others doubt that OpenAI’s safety work will be sufficient.
“OpenAI will say that they are improving. I and other employees who resigned doubt they will be ready in time. This is true not just with OpenAI; the incentives to prioritize rapid development apply to the entire industry. This is why a policy response is needed.”
Helen’s statement provides an interesting paragraph about China at the end.
“A closing note on China: The specter of ceding U.S. technological leadership to China is often treated as a knock-down argument against implementing regulations of any kind. Based on my research on the Chinese AI ecosystem and U.S.-China technology competition more broadly, I think this argument is not nearly as strong as it seems at first glance. We should certainly be mindful of how regulation can affect the pace of innovation at home, and keep a close eye on how our competitors and adversaries are developing and using AI. But looking in depth at Chinese AI development, the AI regulations they are already imposing, and the macro headwinds they face leaves me with the conclusion that they are far from being poised to overtake the United States.6 The fact that targeted, adaptive regulation does not have to slow down U.S. innovation—and in fact can actively support it—only strengthens this point.”
I am impressed regarding Helen Toner’s China comment!
For a while I have been tracking a hypothesis that nobody working in DC in AI Policy would openly and prominently speak against competition with China being a current priority, but this quote shows that hypothesis does not hold.
Now I will track whether any such person explicitly states that it doesn’t matter who gets there first, civilization will most likely end regardless, and that competition shouldn’t be a priority even if China were ahead of the US. I haven’t seen a prominent instance of this happening yet.
I update toward a model of Helen’s statements here not being very representative of what people in DC feel comfortable saying aloud, though to me it’s still nice to know that literally anyone is able to say these words.
Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.
(Edit: to be clear, reporting, not endorsing, these claims)
This is an area where I expect a lot of my info sources to be pretty adversarial, and furthermore I haven’t looked into these issues a great deal, so I don’t have a developed perspective on how bad-faith the Chinese government’s agreements and information sources are.
I think I recall pretty adversarial information-sharing behavior from China toward the rest of the world in March 2020 (which I consider a massive deal), though I’d have to re-read Wikipedia and LessWrong to recall what exactly was going on.
I’m surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.
But in practice, I’d be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)
And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)
I think liability also has the “added” problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it’s just not how America ends up regulating things (we don’t hold Adobe accountable for someone doing something bad with Photoshop.)
To be clear, I don’t think “something is politically unpopular” should be a full-stop argument against advocating for it.
But I do think that “liability for AI companies” scores poorly both on “actual usefulness if implemented” and “political popularity/feasibility.” I also think the “liability for AI companies” advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the “weirder” points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)
I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.
With this in mind, I’m not an expert in liability and admittedly haven’t been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I’d be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).
Stylistic note: I’d prefer replies along the lines of “here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases” rather than replies along the lines of “here is a thing you can read about the general legal/philosophical arguments about how liability is good.”
One reason I feel interested in liability is because it opens up a way to do legal investigations. The legal system has a huge number of privileges that you get to use if you have reasonable suspicion someone has committed a crime or is being negligent. I think it’s quite likely that if there was no direct liability, that even if Microsoft or OpenAI causes some huge catastrophe, that we would never get a proper postmortem or analysis of the facts, and would never reach high-confidence on the actual root-causes.
So while I agree that OpenAI and Microsoft want to of course already avoid being seen as responsible for a large catastrophe, having legal liability makes it much more likely there will be an actual investigation where e.g. the legal system gets to confiscate servers and messages to analyze what happens, which makes it then more likely that if OpenAI and Microsoft are responsible, they will be found out to be responsible.
I think liability-based interventions are substantially more popular with Republicans than other regulatory interventions—they’re substantially more hands-off than, for instance, a regulatory agency. They also feature prominently in the Josh Hawley proposal. I’ve also been told by a republican staffer that liability approaches are relatively popular amongst Rs.
An important baseline point is that AI firms (if they’re selling to consumers) are probably by default covered by product liability by default. If they’re covered by product liability, then they’ll be liable for damages if it can be shown that there was a not excessively costly alternative design that they could have implemented that would have avoided that harm.
If AI firms aren’t covered by product liability, they’re liable according to standard tort law, which means they’re liable if they’re negligent under a reasonable person standard.
Liability law also gives (some, limited) teeth to NIST standards. If a firm can show that it was following NIST safety standards, this gives it a strong argument that it wasn’t being negligent.
I share your scepticism of liability interventions as mechanisms for making important dents in the AI safety problem. Prior to the creation of the EPA, firms were still in principle liable for the harms their pollution caused, but the tort law system is generically a very messy way to get firms to reduce accident risks. It’s expensive and time consuming to go through the court system, courts are reluctant to award punitive damages which means that externalities aren’t internalised even theory (in expectation for firms,) and you need to find a plaintiff with standing to sue firms.
I think there are still some potentially important use cases for liability for reducing AI risks:
Making clear the legal responsibilities of private sector auditors (I’m quite confident that this is a good idea)
Individual liability for individuals with safety responsibilities at firms (although this would be politically unpopular on the right I’d expect)
Creating safe harbours from liability if firms fulfil some set of safety obligations (similarly to the California bill) - ideally safety obligations that are updated over time and tied to best practice
Requiring insurance to cover liability and using this to create better safety practices as firms to reduce insurance premiums and satisfy insurers’ requirements for coverage
Tieing liability to specific failures modes that we expect to correlate with catastrophic failure modes, perhaps tied to a punitive damages regime—for instance holding a firm liable, including for punitive damages if a model causes harm via say goal misgenerlisation or firms lacking industry standard risk management practices
To be clear, I’m still sceptical of liability-based solutions and reasonably strongly favour regulatory proposals (where specific liability provisions will still play an important role.)
I think we should be talking more about potentially denying a frontier AI license to any company that causes a major disaster (within some future licensing regime), where a company’s record before the law passes will be taken into amount.
One alternative method to liability for the AI companies is strong liability for companies using AI systems. This does not directly address risks from frontier labs having dangerous AIs in-house, but helps with risks from AI system deployment in the real world. It indirectly affects labs, because they want to sell their AIs.
A lot of this is the default. For example, Air Canada recently lost a court case after claiming a chatbot promising a refund wasn’t binding on them. However, there could be related opportunities. Companies using AI systems currently don’t have particularly good ways to assess risks from AI deployment, and if models continue getting more capable while reliability continues lagging, they are likely to be willing to pay an increasing amount for ways to get information on concrete risks, guard against it, or derisk it (e.g. through insurance against their deployed AI systems causing harms). I can imagine a service that sells AI-using companies insurance against certain types of deployment risk, that could also double as a consultancy / incentive-provider for lower-risk deployments. I’d be interested to chat if anyone is thinking along similar lines.
There are analogies here in pollution. Some countries force industry to post bonds for damage to the local environment. This is a new innovation that may be working.
The reason the superfund exists in the US is because liability for pollution can be so severe that a company would simply cease to operate, and the mess would not be cleaned up.
In practice, when it comes to taking environmental risks, better to burn the train cars of vinyl chloride, creating a catastrophe too expensive for anyone to clean up or even comprehend than to allow a few gallons to leak, creating an expensive accident that you can actually afford.
New Vox article criticizes Anthropic for trying to weaken SB1047 (as well as for some other things). Some notable sections:
Anthropic is lobbying to water down the bill. It wants to scrap the idea that the government should enforce safety standards before a catastrophe occurs. “Instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices)” the company urges, “focus the bill on holding companies responsible for causing actual catastrophes.” In other words, take no action until something has already gone terribly wrong.
“Anthropic is trying to gut the proposed state regulator and prevent enforcement until after a catastrophe has occurred — that’s like banning the FDA from requiring clinical trials,” Max Tegmark, president of the Future of Life Institute, told me.
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it. “This is straight out of Big Tech’s playbook,” he said
The US has enforceable safety standards in industries ranging from pharma to aviation. Yet tech lobbyists continue to resist such regulations for their own products. Just as social media companies did years ago, they make voluntary commitments to safety to placate those concerned about risks, then fight tooth and nail to stop those commitments being turned into law.
“I am pretty skeptical of things that relate to corporate governance because I think the incentives of corporations are horrendously warped, including ours.” Those are the words of Jack Clark, the policy chief at Anthropic. [Quote is from a year ago]
This article makes some fine points but some misleading ones and its thesis is wrong, I think. Bottom line: Anthropic does lots of good things and is doing much better than being maximally selfish/ruthless. (And of course this is possible, contra the article — Anthropic is led by humans who have various beliefs which may entail that they should make tradeoffs in favor of safety. The space of AI companies is clearly not so perfectly competitive that anyone who makes tradeoffs in favor of safety becomes bankrupt and irrelevant.)
My impression is that these are not big issues. I’m open to hearing counterarguments. [Edit: the scraping is likely a substantial issue for many sites; see comment below. (It is not an x-safety issue, of course.)]
Here’s another tension at the heart of AI development: Companies need to hoover up reams and reams of high-quality text from books and websites in order to train their systems. But that text is created by human beings, and human beings generally do not like having their work used without their consent.
I agree this is not ideal-in-all-ways but I’m not aware of a better alternative.
Web publishers and content creators are angry. Matt Barrie, chief executive of Freelancer.com, a platform that connects freelancers with clients, said Anthropic is “the most aggressive scraper by far,” swarming the site even after being told to stop. “We had to block them because they don’t obey the rules of the internet. This is egregious scraping [that] makes the site slower for everyone operating on it and ultimately affects our revenue.”
This is surprising to me. I’m not familiar with the facts. Seems maybe bad.
Deals like these [investments from Amazon and Google] always come with risks. The tech giants want to see a quick return on their investments and maximize profit. To keep them happy, the AI companies may feel pressure to deploy an advanced AI model even if they’re not sure it’s safe.
Yes there’s nonzero force to this phenomenon, but my impression is that Amazon and Google have almost no hard power over Anthropic and no guaranteed access to its models (unlike e.g. how OpenAI may have to share its models with Microsoft, even if OpenAI thinks the model is unsafe), and I’m not aware of a better alternative.
[Edit: mostly I just think this stuff is not-what-you-should-focus-on if evaluating Anthropic on safety — there are much bigger questions.]
There are some things Anthropic should actually do better. There are some ways it’s kinda impure, like training on the internet and taking investments. Being kinda impure is unavoidable if you want to be a frontier AI company. Insofar as Anthropic is much better on safety than other frontier AI companies, I’m glad it exists.
[Edit: I’m slightly annoyed that the piece feels one-sided — it’s not trying to figure out whether Anthropic makes tradeoffs for safety or how it compares to other frontier AI companies, instead it’s collecting things that sound bad. Maybe this is fine since the article’s role is to contribute facts to the discourse, not be the final word.]
My impression is that these are not big issues. I’m open to hearing counterarguments.
I think the Anthropic scraper has been causing a non-trivial amount of problems for LW. I am kind of confused because there might be scrapers going around that are falsely under the name “claudebot” but in as much as it is Anthropic, it sure has been annoying (like, killed multiple servers and has caused me like 10+ hours of headaches).
The part of the article I actually found most interesting is this:
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it.
This seems worth looking into and would be pretty bad.
I hope you’ve at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.
We complained to them and it’s been better in recent months. We didn’t want to block them because I do actually want LW to be part of the training set.
(Meta: Me posting the article is not an endorsement of the article as a whole. I agree with Zach that lots of sections of it don’t seem fair/balanced and don’t seem to be critical from an extreme risk perspective.
I think the bullet points I listed above summarize the parts that I think are important/relevant.)
I think there’s a decent case that SB 1047 would improve Anthropic’s business prospects, so I’m not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic’s business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.
The good faith interpretation of Anthropic’s argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.
The good faith interpretation of Anthropic’s argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.
Some quick thoughts on this:
If SB1047 passes, labs can still do whatever they want to reduce xrisk. This seems additive to me– I would be surprised if a lab was like “we think XYZ is useful to reduce extreme risks, and we would’ve done them if SB1047 had not passed, but since Y and Z aren’t in the FMD guidance, we’re going to stop doing Y and Z.”
I think the guidance the agency issues will largely be determined by who it employs. I think it’s valid to be like “maybe the FMD will just fail to do a good job because it won’t employ good people”, but to me this is more of a reason to say “how do we make sure the FMD gets staffed with good people who understand how to issue good recommendations”, rather than “there is a risk that you issue bad guidance, therefore we don’t want any guidance.”
I do think that a poorly-implemented FMD could cause harm by diverting company attention/resources toward things that are not productive, but IMO this cost seems relatively small compared to the benefits acquired in the worlds where the FMD issues useful guidance. (I haven’t done a quantitative EV calculation on this though, maybe someone should. I would suspect that even if you give FMD like 20-40% chance of good guidance, and 60-80% chance of useless guidance, the EV would still be net positive.)
Why didn’t industry succeed in killing SB1047 [so far]?
If someone had told me in 2022 that there would be a bill in CA that the major labs opposed and that the tech industry spent a fair amount of effort lobbying against (to the point of getting Congresspeople and Nancy Pelosi to chime in), I would’ve been like “that bill seems like it should get killed pretty early on in the process.”
Like, if the bill has to go through 5+ committees, I would’ve predicted that it would die within the first 3 committees.So what’s going on? Some plausible explanations:
Industry has less power over AI legislation than I (and maybe some others) thought
Industry has more influence on the federal government than on the CA legislatures
Industry underestimated SB1047 early on//didn’t pay much attention to it and the opposition came relatively late in the game
Scott Weiner is really good at building coalitions and forming alliances
SB1047 is relatively light-touch and the burden is very high when industry tries to fight light-touch things
What do you think are the most noteworthy explanations for why industry has failed to kill SB1047 so far?
One question I have is whether Nancy Pelosi was asked and agreed to do this, or whether Nancy Pelosi identified this proactively as an opportunity to try to win back some tech folks to the Dem side. Substantially changes our estimate of how much influence the labs have in this conversation.
One plausible explanation is that industry still thinks it’s likely to kill the bill, and they just didn’t feel like they needed to play their cards sooner.
But this still leaves me surprised– I would’ve expected that it’s in industry’s interest to kill the bill earlier in the process because:
It might be easier to kill earlier on because it hasn’t gained much traction/support
If you want to appear like you’re open to regulation (which seems to be the policy of major AI companies), you probably want to kill it in a relatively silent/invisible way. If you have to be very loud and public and you get to the point where there are a bunch of media articles about it, you lose some credibility/reputation/alliances (and indeed I do think industry has lost some of this “plausibility of good will” as a result of the SB1047 saga)
My rough ranking of different ways superintelligence could be developed:
Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or “USG + Western allies” project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms (“provably safe” systems, human intelligence enhancement, etc.
Safest (but still not a guarantee of success): International coalition.Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms.
My own thought is that we should be advocating for option #3 (international coordination) unless/until there is enough evidence that suggests that it’s actually not feasible, and then we should settle for option #2. I’m not yet convinced by people who say we have to settle for option #2 just because EG climate treaties have not went well or international cooperation is generally difficult.
But I also think people advocating #3 should be aware that there are some worlds in which international cooperation will not be feasible, and we should be prepared to do #2 if it’s quite clear that the US and China are unwilling to cooperate on AGI development. (And again, I don’t think we have that evidence yet– I think there’s a lot of uncertainty here.)
I don’t think the risk ordering is obvious at all, especially not between #2 and #3, and especially not if you also took into account tractability concerns and risks separate from extinction (e.g. stable totalitarianism, s-risks). Even if you thought coordinating with China might be worth it, I think it should be at least somewhat obvious why the US government [/ and its allies] might be very uncomfortable building a coalition with, say, North Korea or Russia. Even between #1 and #2, the probable increase in risks of centralization might make it not worth it, at least in some worlds, depending on how optimistic one might be about e.g. alignment or offense-defense balance from misuse of models with dangerous capabilities.
I also don’t think it’s obvious alternative paradigms would necessarily be both safer and tractable enough, even on 10-year timelines, especially if you don’t use AI automation (using the current paradigm, probably) to push those forward.
the probable increase in risks of centralization might make it not worth it
Can you say more about why the risk of centralization differs meaningfully between the three worlds?
IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...
Then you are very likely (in the absence of coordination) to result in centralization no matter what. It’s just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence.
(If anything, it seems like the “international coalition” approach seems less likely to lead to centralization than the other two approaches, since you’re more likely to get post-AGI coordination.)
especially if you don’t use AI automation (using the current paradigm, probably) to push those forward.
In my vision, the national or international project would be investing into “superalignment”-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well.
I typically assume we don’t get “infinite time”– i.e., even the international coalition is racing against “the clock” (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can’t be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.
IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...
I don’t think this is obvious, stably-multipolar worlds seem at least plausible to me.
@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied?
IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely.
@Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)
(Yeah, I was just literally linking to things people might find relevant to read without making any particular claim. I think this is often slightly helpful, so I do it. Edit: when I do this, I should probably include a disclaimer like “Linking for relevance, not making any specific claim”.)
Yup, I was thinking about worlds in which there is no obvious DSA, or where the parties involved are risk averse enough (perhaps e.g. for reasons like in this talk)
My expectation is that DSI can (and will) be achieved before ASI. In fact, I expect ASI to be about as useful as a bomb which has a minimum effect size of destroying the entire solar system if deployed. In other words, useful only for Mutually Assured Destruction.
DSI only requires a nuclear-armed state actor to have an effective global missile defense system. Whichever nuclear-armed state actor gets that without any other group having that can effectively demand the surrender and disarmament of all other nations. Including confiscating their compute resources.
Do you think missile defense is so difficult that only ASI can manage it? I don’t. That seems like a technical discussion which would need more details to hash out. I’m pretty sure an explicitly designed tool AI and a large drone and satellite fleet could accomplish that.
Competition is fractal. There are multiple hierarchies (countries/departments/agencies/etc, corporations/divisions/teams/etc), with individual humans acting on their own behalf. Often, individuals have influence and goals in multiple hierarchies.
Your 1/2/3 delineation is not the important part. It’s going to be all 3, with chaotic shifts as public perception, funding, and regulation shifts around.
Agree—I think people need to be prepared for “try-or-die” scenarios.
One unfun one I’ll toss into the list: “Company A is 12 months from building Cthulhu, and governments truly do not care and there is extremely strong reason to believe that will not change in the next year. All our policy efforts have failed, our existing technical methods are useless, and the end of the world has come. Everyone report for duty at Company B, we’re going to try to roll the hard six.”
If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don’t understand why you’d want to play the AI arms race—you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.
You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.
If the claims in the piece are true, there seem to be some (seemingly tractable) ways of substantially improving US-China crisis communication.
The barriers seem more bureaucratic (understanding how the defense world works and getting specific agencies/people to do specific things) than political (I doubt this is something you need Congress to pass new legislation to improve.)
In general, I feel like “how do we improve our communication infrastructure during AI-related crises” is an important and underexplored area of AI policy. This isn’t just true for US-China communication but also for “lab-government communication”, “whistleblower-government communication”, and “junior AI staffer-senior national security advisor” communication.
Example: Suppose an eval goes off that suggests that an AI-related emergency might be imminent. How do we make sure this information swiftly gets to relevant people? To what extent do UKAISI and USAISI folks (or lab whistleblowers) have access to senior national security folks who would actually be able to respond in a quick or effective way?
I think IAPS’ CDDC paper is a useful contribution here. I will soon be releasing a few papers in this broad space, with a focus on interventions that can improve emergency detection + emergency response.
One benefit of workshops/conferences/Track 2 dialogues might simply be that you get relevant people to meet each other, share contact information, build trust/positive vibes, and be more likely to reach out in the event of an emergency scenario.
Establishing things like the AI Safety and Security Board might also be useful for similar reasons. I think this has gotten a fair amount of criticism for being too industry-focused, and some of that is justified. Nonetheless, I think interventions along the lines of “make it easy for the people who might see the first signs of extreme risk have super clear ways of advising/contacting government officials” seem great.
Why do people think there’s a ~50% chance that Newsom will veto SB1047?
The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn’t been very controversial among CA politicians.
Is the main idea here that Newsom’s incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like “maybe Newsom doesn’t want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)
Or are there other factors that are especially influential in peoples’ models here?
My model is basically just “Newsom likely doesn’t want to piss off Big Tech or Pelosi, and the incentive to not veto doesn’t seem that high, and so seems highly likely to veto, and 50% veto seems super low”. My fair is, like, 80% veto I think?
I’m not that compelled by the base rates argument, because I think the level of controversy over the bill is atypically high, so it’s quite out of distribution. Eg I think Pelosi denouncing it is very unusual for a state Bill and a pretty big deal
Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi’s statement didn’t come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.)
To me, the most obvious explanation is probably something like “Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders”– is this what’s driving your model?
This is a fair point. I think Newsom is a very visible and prominent target who has more risk here (I imagine people don’t pay that much attention to individual California legislators), it’s individually his fault if he doesn’t veto, and he wants to be President and thus cares much more about national stuff. While the California legislators were probably annoyed at Pelosi butting into state business.
Is there some source that particularly indicates this? I get why the 15% base rate might be low, but haven’t actually seen evidence apart from this Manifold question that it’d be higher.
Newsom’s stance on Big Tech is a bit murky. He pushed ideas like the Data Dividend but overall, he seems pretty friendly to the industry.
As for Pelosi, she’s still super influential, but she’ll be 88 by the next presidential election. Her long-term influence is definitely something to watch and Newsom probably has a good read on how things will shift.
But it seems like SB1047 hasn’t been very controversial among CA politicians.
I think this isn’t true. Concretely, I bet that if you looked at the distribution of Democratic No votes among bills that reached Newsom’s desk, this one would be among the highest (7 No votes and a bunch of not-voting, which I think is just a polite way to vote No; source). I haven’t checked and could be wrong!
My take is basically the same as Neel’s, though my all-things-considered guess is that he’s 60% or so to veto. My position on Manifold is in large part an emotional hedge. (Otherwise I would be placing much smaller bets in the same direction.)
Recommended readings for people interested in evals work?
Someone recently asked: “Suppose someone wants to get into evals work. Is there a good reading list to send to them?” I spent ~5 minutes and put this list together. I’d be interested if people have additional suggestions or recommendations:
A paper I’m writing on semi-structured interviews as a good complement to formal evaluations (in-progress)
I would also encourage them to read stuff more on the “macrostrategy” of evals. Like, I suspect a lot of value will come from people who are able to understand the broader theory of change of evals and identify when we’re “rowing” in bad directions. Some examples here might be:
I’m interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion.
If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.
My current perspective is that criticism of AGI labs is an under-incentivized public good. I suspect there’s a disproportionate amount of value that people could have by evaluating lab plans, publicly criticizing labs when they break commitments or make poor arguments, talking to journalists/policymakers about their concerns, etc.
Some quick thoughts:
Soft power– I think people underestimate the how strong the “soft power” of labs is, particularly in the Bay Area.
Jobs– A large fraction of people getting involved in AI safety are interested in the potential of working for a lab one day. There are some obvious reasons for this– lots of potential impact from being at the organizations literally building AGI, big salaries, lots of prestige, etc.
People (IMO correctly) perceive that if they acquire a reputation for being critical of labs, their plans, or their leadership, they will essentially sacrifice the ability to work at the labs.
So you get an equilibrium where the only people making (strong) criticisms of labs are those who have essentially chosen to forgo their potential of working there.
Money– The labs and Open Phil (which has been perceived, IMO correctly, as investing primarily into metastrategies that are aligned with lab interests) have an incredibly large share of the $$$ in the space. When funding became more limited, this became even more true, and I noticed a very tangible shift in the culture & discourse around labs + Open Phil
Status games//reputation– Groups who were more inclined to criticize labs and advocate for public or policymaker outreach were branded as “unilateralist”, “not serious”, and “untrustworthy” in core EA circles. In many cases, there were genuine doubts about these groups, but my impression is that these doubts got amplified/weaponized in cases where the groups were more openly critical of the labs.
Subjectivity of “good judgment”– There is a strong culture of people getting jobs/status for having “good judgment”. This is sensible insofar as we want people with good judgment (who wouldn’t?) but this often ends up being so subjective that it ends up leading to people being quite afraid to voice opinions that go against mainstream views and metastrategies (particularly those endorsed by labs + Open Phil).
Anecdote– Personally, I found my ability to evaluate and critique labs + mainstream metastrategies substantially improved when I spent more time around folks in London and DC (who were less closely tied to the labs). In fairness, I suspect that if I had lived in London or DC *first* and then moved to the Bay Area, it’s plausible I would’ve had a similar feeling but in the “reverse direction”.
With all this in mind, I find myself more deeply appreciating folks who have publicly and openly critiqued labs, even in situations where the cultural and economic incentives to do so were quite weak (relative to staying silent or saying generic positive things about labs).
Examples: Habryka, Rob Bensinger, CAIS, MIRI, Conjecture, and FLI. More recently, @Zach Stein-Perlman, and of course Jan Leike and Daniel K.
Sorry for brevity, I’m busy right now.
Noticing good stuff labs do, not just criticizing them, is often helpful. I wish you thought of this work more as “evaluation” than “criticism.”
It’s often important for evaluation to be quite truth-tracking. Criticism isn’t obviously good by default.
Edit:
3. I’m pretty sure OP likes good criticism of the labs; no comment on how OP is perceived. And I think I don’t understand your “good judgment” point. Feedback I’ve gotten on AI Lab Watch from senior AI safety people has been overwhelmingly positive, and of course there’s a selection effect in what I hear, but I’m quite sure most of them support such efforts.
4. Conjecture (not exclusively) has done things that frustrated me, including in dimensions like being “‘unilateralist,’ ‘not serious,’ and ‘untrustworthy.’” I think most criticism of Conjecture-related advocacy is legitimate and not just because people are opposed to criticizing labs.
5. I do agree on “soft power” and some of “jobs.” People often don’t criticize the labs publicly because they’re worried about negative effects on them, their org, or people associated with them.
RE 1& 2:
Agreed— my main point here is that the marketplace of ideas undervalues criticism.
I think one perspective could be “we should all just aim to do objective truth-seeking”, and as stated I agree with it.
The main issue with that frame, imo, is that it’s very easy to forget that the epistemic environment can be tilted in favor of certain perspectives.
EG I think it can be useful for “objective truth-seeking efforts” to be aware of some of the culture/status games that underincentivize criticism of labs & amplify lab-friendly perspectives.
RE 3:
Good to hear that responses have been positive to lab watch. My impression is that this is a mix of: (a) lab watch doesn’t really threaten the interests of labs (especially Anthropic, which is currently winning & currently the favorite lab among senior AIS ppl), (b) the tides have been shifting somewhat and it is genuinely less taboo to criticize labs than a year ago, and (c) EAs respond more positively to criticism that feels more detailed/nuanced (look I have these 10 categories, let’s rate the labs on each dimension) than criticisms that are more about metastrategy (e.g., challenging the entire RSP frame or advocating for policymaker outreach).
RE 4: I haven’t heard anything about Conjecture that I’ve found particularly concerning. Would be interested in you clarifying (either here or via DM) what you’ve heard. (And clarification note that my original point was less “Conjecture hasn’t done anything wrong” and more “I suspect Conjecture will be more heavily scrutinized and examined and have a disproportionate amount of optimization pressure applied against it given its clear push for things that would hurt lab interests.”)
I think now is a good time for people at labs to seriously consider quitting & getting involved in government/policy efforts.
I don’t think everyone should leave labs (obviously). But I would probably hit a button that does something like “everyone at a lab governance team and many technical researchers spend at least 2 hours thinking/writing about alternative options they have & very seriously consider leaving.”
My impression is that lab governance is much less tractable (lab folks have already thought a lot more about AGI) and less promising (competitive pressures are dominating) than government-focused work.
I think governments still remain unsure about what to do, and there’s a lot of potential for folks like Daniel K to have a meaningful role in shaping policy, helping natsec folks understand specific threat models, and raising awareness about the specific kinds of things governments need to do in order to mitigate risks.
There may be specific opportunities at labs that are very high-impact, but I think if someone at a lab is “not really sure if what they’re doing is making a big difference”, I would probably hit a button that allocates them toward government work or government-focused comms work.
Written on a Slack channel in response to discussions about some folks leaving OpenAI.
I’d be worried about evaporative cooling. It seems that the net result of this would be that labs would be almost completely devoid of people earnest about safety.
I agree with you government pathways to impact are most plausible and until recently undervalued. I also agree with you there are weird competitive pressures at labs.
I do think evaporative cooling is a concern, especially if everyone (or a very significant amount) of people left. But I think on the margin more people should be leaving to work in govt.
I also suspect that a lot of systemic incentives will keep a greater-than-optimal proportion of safety-conscious people at labs as opposed to governments (labs pay more, labs are faster and have less bureaucracy, lab people are much more informed about AI, labs are more “cool/fun/fast-paced”, lots of govt jobs force you to move locations, etc.)
I also think it depends on the specific lab– EG in light of the recent OpenAI departures, I suspect there’s a stronger case for staying at OpenAI right now than for DeepMind or Anthropic.
I largely agree, but think given government hiring timelines, there’s no dishonor in staying at a lab doing moderately risk-reducing work until you get a hiring offer with an actual start date. This problem is less bad for the special hiring authorities being used for AI stuff oftentimes, but it’s still not ideal.
Here are some AI governance/policy thoughts that I’ve found myself articulating at least 3 times over the last month or so:
I think people interested in AI governance/policy should divide their projects into “things that could be useful in the current Overton Window” and “things that would require a moderate or major Overton Window shift to be useful.” I think sometimes people end up not thinking concretely about which world they’re aiming for, and this makes their work less valuable.
If you’re aiming for the current Overton Window, you need to be brutally honest about what you can actually achieve. There are many barriers to implementing sensible-seeming ideas. You need access to stakeholders who can do something. You should try to fail quickly. If your idea requires buy-in from XYZ folks, and X isn’t interested, that’s worth figuring out ASAP.
If you’re aiming for something outside the current Overton Window, you often have a lot of room to be imaginative. I think it’s very easy to underestimate Overton Window shifts. If policymakers get considerably more concerned about AI risks, there are a lot of things that will “on the table”. People say that AI safety folks were unprepared for the chatGPT surge– if you think that there will be 1-2 more surges of interest, it might be worth explicitly preparing for ideas that would be considered in those surges.
I think it’s pretty essential to be in regular touch with policymakers/staffers if your main TOC is to get things done in the current Overton Window.
A common failure mode for “research types” is to write a 20+page paper and then ask “ok cool, which policymakers might be interested?” I think usually a better strategy is to try to get in touch with your target audience much earlier on in the process. Present the 1-2 page version of your idea and see if/where the nuance is useful. (To be clear, this is if your TOC involves directly influencing policy. This doesn’t apply if your main TOC is to improve everyone’s understanding of X topic or improve your own understanding of Y topic).
On the margin, I think more people who are new to AI governance/policy should be focusing on “things that would require a moderate or major Overton Window shift to be useful.” I think there’s more low-hanging fruit there that people can contribute to without necessarily having the kinds of networks/access that you often need to know what to do in the current Overton Window.
I think people tend to underestimate how quickly they could become a world expert in a specific area. This is especially true if you’re applying it to the intersection of two areas. For example, it’s very hard to become a world expert in international governance. But it’s relatively easier to become a world expert in the intersection of “international governance” and “AI safety”. There will be people who know more about international governance than you and people who know more about AI safety than you, but you might become one of the people who has thought the most rigorously about the intersection of the two topics.
I think I agree with much-to-all of this. One further amplification I’d make about the last point: the culture of DC policymaking is one where people are expected to be quick studies and it’s OK to be new to a topic; talent is much more funged from topic to topic in response to changing priorities than you’d expect. Your Lesswrong-informed outside view of how much you need to know on a topic to start commenting on policy ideas is probably wrong.
(Yes, I know, someone is about to say “but what if you are WRONG about the big idea given weird corner case X or second-order effects Y?” Look, reversed stupidity is not wisdom, but also also sometimes you can just quickly identify stupid-across-almost-all-possible-worlds ideas and convince people just not to do them rather than having to advocate for an explicit good-idea alternative.)
I think how delicately you treat your personal Overton Window should also depend on your timelines.
Recent Senate hearing includes testimony from Helen Toner and William Saunders.
Both statements are explicit about AGI risks & emphasize the importance of transparency & whistleblower mechanisms.
William’s statement acknowledges that he and others doubt that OpenAI’s safety work will be sufficient.
“OpenAI will say that they are improving. I and other employees who resigned doubt they will be ready in time. This is true not just with OpenAI; the incentives to prioritize rapid development apply to the entire industry. This is why a policy response is needed.”
Helen’s statement provides an interesting paragraph about China at the end.
“A closing note on China: The specter of ceding U.S. technological leadership to China is often treated as a knock-down argument against implementing regulations of any kind. Based on my research on the Chinese AI ecosystem and U.S.-China technology competition more broadly, I think this argument is not nearly as strong as it seems at first glance. We should certainly be mindful of how regulation can affect the pace of innovation at home, and keep a close eye on how our competitors and adversaries are developing and using AI. But looking in depth at Chinese AI development, the AI regulations they are already imposing, and the macro headwinds they face leaves me with the conclusion that they are far from being poised to overtake the United States.6 The fact that targeted, adaptive regulation does not have to slow down U.S. innovation—and in fact can actively support it—only strengthens this point.”
Full hearing here (I haven’t watched it yet.)
I am impressed regarding Helen Toner’s China comment!
For a while I have been tracking a hypothesis that nobody working in DC in AI Policy would openly and prominently speak against competition with China being a current priority, but this quote shows that hypothesis does not hold.
Now I will track whether any such person explicitly states that it doesn’t matter who gets there first, civilization will most likely end regardless, and that competition shouldn’t be a priority even if China were ahead of the US. I haven’t seen a prominent instance of this happening yet.
Toner is one of the only people criticizing the China arms race claims, like last year: https://www.foreignaffairs.com/china/illusion-chinas-ai-prowess-regulation-helen-toner This also earned her some enmity on social media as a Commie stooge last year.
Appreciate the link (and for others, here’s an archived version without the paywall.)
I update toward a model of Helen’s statements here not being very representative of what people in DC feel comfortable saying aloud, though to me it’s still nice to know that literally anyone is able to say these words.
Generally, it is difficult to understate how completely the PRC is seen as a bad-faith actor in DC these days. Many folks saw them engage in mass economic espionage for a decade while repeatedly promising to stop; those folks are now more senior in their careers than those formative moments. Then COVID happened, and while not everyone believes in the lab leak hypothesis, basically everyone believes that the PRC sure as heck reflexively covered up whether or not they were actually culpable.
(Edit: to be clear, reporting, not endorsing, these claims)
Thanks for the info.
This is an area where I expect a lot of my info sources to be pretty adversarial, and furthermore I haven’t looked into these issues a great deal, so I don’t have a developed perspective on how bad-faith the Chinese government’s agreements and information sources are.
I think I recall pretty adversarial information-sharing behavior from China toward the rest of the world in March 2020 (which I consider a massive deal), though I’d have to re-read Wikipedia and LessWrong to recall what exactly was going on.
I’m surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.
But in practice, I’d be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)
And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)
I think liability also has the “added” problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it’s just not how America ends up regulating things (we don’t hold Adobe accountable for someone doing something bad with Photoshop.)
To be clear, I don’t think “something is politically unpopular” should be a full-stop argument against advocating for it.
But I do think that “liability for AI companies” scores poorly both on “actual usefulness if implemented” and “political popularity/feasibility.” I also think the “liability for AI companies” advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the “weirder” points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)
I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.
With this in mind, I’m not an expert in liability and admittedly haven’t been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I’d be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).
Stylistic note: I’d prefer replies along the lines of “here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases” rather than replies along the lines of “here is a thing you can read about the general legal/philosophical arguments about how liability is good.”
One reason I feel interested in liability is because it opens up a way to do legal investigations. The legal system has a huge number of privileges that you get to use if you have reasonable suspicion someone has committed a crime or is being negligent. I think it’s quite likely that if there was no direct liability, that even if Microsoft or OpenAI causes some huge catastrophe, that we would never get a proper postmortem or analysis of the facts, and would never reach high-confidence on the actual root-causes.
So while I agree that OpenAI and Microsoft want to of course already avoid being seen as responsible for a large catastrophe, having legal liability makes it much more likely there will be an actual investigation where e.g. the legal system gets to confiscate servers and messages to analyze what happens, which makes it then more likely that if OpenAI and Microsoft are responsible, they will be found out to be responsible.
I found this answer helpful and persuasive– thank you!
I think liability-based interventions are substantially more popular with Republicans than other regulatory interventions—they’re substantially more hands-off than, for instance, a regulatory agency. They also feature prominently in the Josh Hawley proposal. I’ve also been told by a republican staffer that liability approaches are relatively popular amongst Rs.
An important baseline point is that AI firms (if they’re selling to consumers) are probably by default covered by product liability by default. If they’re covered by product liability, then they’ll be liable for damages if it can be shown that there was a not excessively costly alternative design that they could have implemented that would have avoided that harm.
If AI firms aren’t covered by product liability, they’re liable according to standard tort law, which means they’re liable if they’re negligent under a reasonable person standard.
Liability law also gives (some, limited) teeth to NIST standards. If a firm can show that it was following NIST safety standards, this gives it a strong argument that it wasn’t being negligent.
I share your scepticism of liability interventions as mechanisms for making important dents in the AI safety problem. Prior to the creation of the EPA, firms were still in principle liable for the harms their pollution caused, but the tort law system is generically a very messy way to get firms to reduce accident risks. It’s expensive and time consuming to go through the court system, courts are reluctant to award punitive damages which means that externalities aren’t internalised even theory (in expectation for firms,) and you need to find a plaintiff with standing to sue firms.
I think there are still some potentially important use cases for liability for reducing AI risks:
Making clear the legal responsibilities of private sector auditors (I’m quite confident that this is a good idea)
Individual liability for individuals with safety responsibilities at firms (although this would be politically unpopular on the right I’d expect)
Creating safe harbours from liability if firms fulfil some set of safety obligations (similarly to the California bill) - ideally safety obligations that are updated over time and tied to best practice
Requiring insurance to cover liability and using this to create better safety practices as firms to reduce insurance premiums and satisfy insurers’ requirements for coverage
Tieing liability to specific failures modes that we expect to correlate with catastrophic failure modes, perhaps tied to a punitive damages regime—for instance holding a firm liable, including for punitive damages if a model causes harm via say goal misgenerlisation or firms lacking industry standard risk management practices
To be clear, I’m still sceptical of liability-based solutions and reasonably strongly favour regulatory proposals (where specific liability provisions will still play an important role.)
I’m not a lawyer and have no legal training.
I think we should be talking more about potentially denying a frontier AI license to any company that causes a major disaster (within some future licensing regime), where a company’s record before the law passes will be taken into amount.
One alternative method to liability for the AI companies is strong liability for companies using AI systems. This does not directly address risks from frontier labs having dangerous AIs in-house, but helps with risks from AI system deployment in the real world. It indirectly affects labs, because they want to sell their AIs.
A lot of this is the default. For example, Air Canada recently lost a court case after claiming a chatbot promising a refund wasn’t binding on them. However, there could be related opportunities. Companies using AI systems currently don’t have particularly good ways to assess risks from AI deployment, and if models continue getting more capable while reliability continues lagging, they are likely to be willing to pay an increasing amount for ways to get information on concrete risks, guard against it, or derisk it (e.g. through insurance against their deployed AI systems causing harms). I can imagine a service that sells AI-using companies insurance against certain types of deployment risk, that could also double as a consultancy / incentive-provider for lower-risk deployments. I’d be interested to chat if anyone is thinking along similar lines.
There are analogies here in pollution. Some countries force industry to post bonds for damage to the local environment. This is a new innovation that may be working.
The reason the superfund exists in the US is because liability for pollution can be so severe that a company would simply cease to operate, and the mess would not be cleaned up.
In practice, when it comes to taking environmental risks, better to burn the train cars of vinyl chloride, creating a catastrophe too expensive for anyone to clean up or even comprehend than to allow a few gallons to leak, creating an expensive accident that you can actually afford.
New Vox article criticizes Anthropic for trying to weaken SB1047 (as well as for some other things). Some notable sections:
Anthropic is lobbying to water down the bill. It wants to scrap the idea that the government should enforce safety standards before a catastrophe occurs. “Instead of deciding what measures companies should take to prevent catastrophes (which are still hypothetical and where the ecosystem is still iterating to determine best practices)” the company urges, “focus the bill on holding companies responsible for causing actual catastrophes.” In other words, take no action until something has already gone terribly wrong.
“Anthropic is trying to gut the proposed state regulator and prevent enforcement until after a catastrophe has occurred — that’s like banning the FDA from requiring clinical trials,” Max Tegmark, president of the Future of Life Institute, told me.
In what he called “a cynical procedural move,” Tegmark noted that Anthropic has also introduced amendments to the bill that touch on the remit of every committee in the legislature, thereby giving each committee another opportunity to kill it. “This is straight out of Big Tech’s playbook,” he said
The US has enforceable safety standards in industries ranging from pharma to aviation. Yet tech lobbyists continue to resist such regulations for their own products. Just as social media companies did years ago, they make voluntary commitments to safety to placate those concerned about risks, then fight tooth and nail to stop those commitments being turned into law.
“I am pretty skeptical of things that relate to corporate governance because I think the incentives of corporations are horrendously warped, including ours.” Those are the words of Jack Clark, the policy chief at Anthropic. [Quote is from a year ago]
This article makes some fine points but some misleading ones and its thesis is wrong, I think. Bottom line: Anthropic does lots of good things and is doing much better than being maximally selfish/ruthless. (And of course this is possible, contra the article — Anthropic is led by humans who have various beliefs which may entail that they should make tradeoffs in favor of safety. The space of AI companies is clearly not so perfectly competitive that anyone who makes tradeoffs in favor of safety becomes bankrupt and irrelevant.)
Yep, Anthropic’s policy advocacy seems bad.
My impression is that these are not big issues. I’m open to hearing counterarguments. [Edit: the scraping is likely a substantial issue for many sites; see comment below. (It is not an x-safety issue, of course.)]
I agree this is not ideal-in-all-ways but I’m not aware of a better alternative.
This is surprising to me. I’m not familiar with the facts. Seems maybe bad.
Yes there’s nonzero force to this phenomenon, but my impression is that Amazon and Google have almost no hard power over Anthropic and no guaranteed access to its models (unlike e.g. how OpenAI may have to share its models with Microsoft, even if OpenAI thinks the model is unsafe), and I’m not aware of a better alternative.
[Edit: mostly I just think this stuff is not-what-you-should-focus-on if evaluating Anthropic on safety — there are much bigger questions.]
There are some things Anthropic should actually do better. There are some ways it’s kinda impure, like training on the internet and taking investments. Being kinda impure is unavoidable if you want to be a frontier AI company. Insofar as Anthropic is much better on safety than other frontier AI companies, I’m glad it exists.
[Edit: I’m slightly annoyed that the piece feels one-sided — it’s not trying to figure out whether Anthropic makes tradeoffs for safety or how it compares to other frontier AI companies, instead it’s collecting things that sound bad. Maybe this is fine since the article’s role is to contribute facts to the discourse, not be the final word.]
I think the Anthropic scraper has been causing a non-trivial amount of problems for LW. I am kind of confused because there might be scrapers going around that are falsely under the name “claudebot” but in as much as it is Anthropic, it sure has been annoying (like, killed multiple servers and has caused me like 10+ hours of headaches).
The part of the article I actually found most interesting is this:
This seems worth looking into and would be pretty bad.
I hope you’ve at least throttled them or IP blocked them temporarily for being annoying. It is not that difficult to scrape a website while respecting its bandwidth and CPU limitations.
We complained to them and it’s been better in recent months. We didn’t want to block them because I do actually want LW to be part of the training set.
+1 to lots of this.
(Meta: Me posting the article is not an endorsement of the article as a whole. I agree with Zach that lots of sections of it don’t seem fair/balanced and don’t seem to be critical from an extreme risk perspective.
I think the bullet points I listed above summarize the parts that I think are important/relevant.)
I think there’s a decent case that SB 1047 would improve Anthropic’s business prospects, so I’m not sure this narrative makes sense. On one hand, SB 1047 might make it less profitable to run an AGI company, which is bad for Anthropic’s business plan. But Anthropic is perhaps the best positioned of all AGI companies to comply with the requirements of SB 1047, and might benefit significantly from their competitors being hampered by the law.
The good faith interpretation of Anthropic’s argument would be that the new agency created by the bill might be very bad at issuing guidance that actually reduces x-risk, and you might prefer the decision-making of AI labs with a financial incentive to avoid catastrophes without additional pressure to follow the exact recommendations of the new agency.
Some quick thoughts on this:
If SB1047 passes, labs can still do whatever they want to reduce xrisk. This seems additive to me– I would be surprised if a lab was like “we think XYZ is useful to reduce extreme risks, and we would’ve done them if SB1047 had not passed, but since Y and Z aren’t in the FMD guidance, we’re going to stop doing Y and Z.”
I think the guidance the agency issues will largely be determined by who it employs. I think it’s valid to be like “maybe the FMD will just fail to do a good job because it won’t employ good people”, but to me this is more of a reason to say “how do we make sure the FMD gets staffed with good people who understand how to issue good recommendations”, rather than “there is a risk that you issue bad guidance, therefore we don’t want any guidance.”
I do think that a poorly-implemented FMD could cause harm by diverting company attention/resources toward things that are not productive, but IMO this cost seems relatively small compared to the benefits acquired in the worlds where the FMD issues useful guidance. (I haven’t done a quantitative EV calculation on this though, maybe someone should. I would suspect that even if you give FMD like 20-40% chance of good guidance, and 60-80% chance of useless guidance, the EV would still be net positive.)
Why didn’t industry succeed in killing SB1047 [so far]?
If someone had told me in 2022 that there would be a bill in CA that the major labs opposed and that the tech industry spent a fair amount of effort lobbying against (to the point of getting Congresspeople and Nancy Pelosi to chime in), I would’ve been like “that bill seems like it should get killed pretty early on in the process.”
Like, if the bill has to go through 5+ committees, I would’ve predicted that it would die within the first 3 committees.So what’s going on? Some plausible explanations:
Industry has less power over AI legislation than I (and maybe some others) thought
Industry has more influence on the federal government than on the CA legislatures
Industry underestimated SB1047 early on//didn’t pay much attention to it and the opposition came relatively late in the game
Scott Weiner is really good at building coalitions and forming alliances
SB1047 is relatively light-touch and the burden is very high when industry tries to fight light-touch things
What do you think are the most noteworthy explanations for why industry has failed to kill SB1047 so far?
One question I have is whether Nancy Pelosi was asked and agreed to do this, or whether Nancy Pelosi identified this proactively as an opportunity to try to win back some tech folks to the Dem side. Substantially changes our estimate of how much influence the labs have in this conversation.
One plausible explanation is that industry still thinks it’s likely to kill the bill, and they just didn’t feel like they needed to play their cards sooner.
But this still leaves me surprised– I would’ve expected that it’s in industry’s interest to kill the bill earlier in the process because:
It might be easier to kill earlier on because it hasn’t gained much traction/support
If you want to appear like you’re open to regulation (which seems to be the policy of major AI companies), you probably want to kill it in a relatively silent/invisible way. If you have to be very loud and public and you get to the point where there are a bunch of media articles about it, you lose some credibility/reputation/alliances (and indeed I do think industry has lost some of this “plausibility of good will” as a result of the SB1047 saga)
My rough ranking of different ways superintelligence could be developed:
Least safe: Corporate Race. Superintelligence is developed in the context of a corporate race between OpenAI, Microsoft, Google, Anthropic, and Facebook.
Safer (but still quite dangerous): USG race with China. Superintelligence is developed in the context of a USG project or “USG + Western allies” project with highly secure weights. The coalition hopefully obtains a lead of 1-3 years that it tries to use to align superintelligence and achieve a decisive strategic advantage. This probably relies heavily on deep learning and means we do not have time to invest into alternative paradigms (“provably safe” systems, human intelligence enhancement, etc.
Safest (but still not a guarantee of success): International coalition. Superintelligence is developed in the context of an international project with highly secure weights. The coalition still needs to develop superintelligence before rogue projects can, but the coalition hopes to obtain a lead of 10+ years that it can use to align a system that can prevent rogue AGI projects. This could buy us enough time to invest heavily in alternative paradigms.
My own thought is that we should be advocating for option #3 (international coordination) unless/until there is enough evidence that suggests that it’s actually not feasible, and then we should settle for option #2. I’m not yet convinced by people who say we have to settle for option #2 just because EG climate treaties have not went well or international cooperation is generally difficult.
But I also think people advocating #3 should be aware that there are some worlds in which international cooperation will not be feasible, and we should be prepared to do #2 if it’s quite clear that the US and China are unwilling to cooperate on AGI development. (And again, I don’t think we have that evidence yet– I think there’s a lot of uncertainty here.)
I don’t think the risk ordering is obvious at all, especially not between #2 and #3, and especially not if you also took into account tractability concerns and risks separate from extinction (e.g. stable totalitarianism, s-risks). Even if you thought coordinating with China might be worth it, I think it should be at least somewhat obvious why the US government [/ and its allies] might be very uncomfortable building a coalition with, say, North Korea or Russia. Even between #1 and #2, the probable increase in risks of centralization might make it not worth it, at least in some worlds, depending on how optimistic one might be about e.g. alignment or offense-defense balance from misuse of models with dangerous capabilities.
I also don’t think it’s obvious alternative paradigms would necessarily be both safer and tractable enough, even on 10-year timelines, especially if you don’t use AI automation (using the current paradigm, probably) to push those forward.
Can you say more about why the risk of centralization differs meaningfully between the three worlds?
IMO if you assume that (a) an intelligence explosion occurs at some point, (b) the leading actor uses the intelligence explosion to produce a superintelligence that provides a decisive strategic advantage, and (c) the superintelligence is aligned/controlled...
Then you are very likely (in the absence of coordination) to result in centralization no matter what. It’s just a matter of whether OpenAI/Microsoft (scenario #1), the USG and allies (scenario #2), or a broader international coalition (weighted heavily toward the USG and China) are the ones wielding the superintelligence.
(If anything, it seems like the “international coalition” approach seems less likely to lead to centralization than the other two approaches, since you’re more likely to get post-AGI coordination.)
In my vision, the national or international project would be investing into “superalignment”-style approaches, they would just (hopefully) have enough time/resources to be investing into other approaches as well.
I typically assume we don’t get “infinite time”– i.e., even the international coalition is racing against “the clock” (e.g., the amount of time it takes for a rogue actor to develop ASI in a way that can’t be prevented, or the amount of time we have until a separate existential catastrophe occurs.) So I think it would be unwise for the international coalition to completely abandon DL/superalignemnt, even if one of the big hopes is that a safer paradigm would be discovered in time.
I don’t think this is obvious, stably-multipolar worlds seem at least plausible to me.
See also here and here.
@Bodgan, Can you spell out a vision for a stably multipolar world with the above assumptions satisfied?
IMO assumption B is doing a lot of the work— you might argue that the IE will not give anyone a DSA, in which case things get more complicated. I do see some plausible stories in which this could happen but they seem pretty unlikely.
@Ryan, thanks for linking to those. Lmk if there are particular points you think are most relevant (meta: I think in general I find discourse more productive when it’s like “hey here’s a claim, also read more here” as opposed to links. Ofc that puts more communication burden on you though, so feel free to just take the links approach.)
(Yeah, I was just literally linking to things people might find relevant to read without making any particular claim. I think this is often slightly helpful, so I do it. Edit: when I do this, I should probably include a disclaimer like “Linking for relevance, not making any specific claim”.)
Yup, I was thinking about worlds in which there is no obvious DSA, or where the parties involved are risk averse enough (perhaps e.g. for reasons like in this talk)
My expectation is that DSI can (and will) be achieved before ASI. In fact, I expect ASI to be about as useful as a bomb which has a minimum effect size of destroying the entire solar system if deployed. In other words, useful only for Mutually Assured Destruction. DSI only requires a nuclear-armed state actor to have an effective global missile defense system. Whichever nuclear-armed state actor gets that without any other group having that can effectively demand the surrender and disarmament of all other nations. Including confiscating their compute resources. Do you think missile defense is so difficult that only ASI can manage it? I don’t. That seems like a technical discussion which would need more details to hash out. I’m pretty sure an explicitly designed tool AI and a large drone and satellite fleet could accomplish that.
Competition is fractal. There are multiple hierarchies (countries/departments/agencies/etc, corporations/divisions/teams/etc), with individual humans acting on their own behalf. Often, individuals have influence and goals in multiple hierarchies.
Your 1/2/3 delineation is not the important part. It’s going to be all 3, with chaotic shifts as public perception, funding, and regulation shifts around.
Agree—I think people need to be prepared for “try-or-die” scenarios.
One unfun one I’ll toss into the list: “Company A is 12 months from building Cthulhu, and governments truly do not care and there is extremely strong reason to believe that will not change in the next year. All our policy efforts have failed, our existing technical methods are useless, and the end of the world has come. Everyone report for duty at Company B, we’re going to try to roll the hard six.”
If Company A is 12 months from building Cthulhu, we fucked up upstream. Also, I don’t understand why you’d want to play the AI arms race—you have better options. They expect an AI arms race. Use other tactics. Get into their OODA loop.
Unsee the frontier lab.
...yes ? I think my scenario explicitly assumes that we’ve fucked up upstream in many, many ways.
Oh, by that I meant something like “yeah I really think it is not a good idea to focus on an AI arms race”. See also Slack matters more than any other outcome.
You are probably already familiar with this, but re option 3, the Multilateral AGI Consortium (MAGIC) proposal is I assume along the lines of what you are thinking.
Indeed, Akash is familiar: https://arxiv.org/abs/2310.20563 :)
(I think it was a later paper he co-authored than the one you cite)
Recommended reading: A recent piece argues that the US-China crisis hotline doesn’t work & generally raises some concerns about US-China crisis communication.
Some quick thoughts:
If the claims in the piece are true, there seem to be some (seemingly tractable) ways of substantially improving US-China crisis communication.
The barriers seem more bureaucratic (understanding how the defense world works and getting specific agencies/people to do specific things) than political (I doubt this is something you need Congress to pass new legislation to improve.)
In general, I feel like “how do we improve our communication infrastructure during AI-related crises” is an important and underexplored area of AI policy. This isn’t just true for US-China communication but also for “lab-government communication”, “whistleblower-government communication”, and “junior AI staffer-senior national security advisor” communication.
Example: Suppose an eval goes off that suggests that an AI-related emergency might be imminent. How do we make sure this information swiftly gets to relevant people? To what extent do UKAISI and USAISI folks (or lab whistleblowers) have access to senior national security folks who would actually be able to respond in a quick or effective way?
I think IAPS’ CDDC paper is a useful contribution here. I will soon be releasing a few papers in this broad space, with a focus on interventions that can improve emergency detection + emergency response.
One benefit of workshops/conferences/Track 2 dialogues might simply be that you get relevant people to meet each other, share contact information, build trust/positive vibes, and be more likely to reach out in the event of an emergency scenario.
Establishing things like the AI Safety and Security Board might also be useful for similar reasons. I think this has gotten a fair amount of criticism for being too industry-focused, and some of that is justified. Nonetheless, I think interventions along the lines of “make it easy for the people who might see the first signs of extreme risk have super clear ways of advising/contacting government officials” seem great.
Why do people think there’s a ~50% chance that Newsom will veto SB1047?
The base rate for vetoes is about 15%. Perhaps the base rate for controversial bills is higher. But it seems like SB1047 hasn’t been very controversial among CA politicians.
Is the main idea here that Newsom’s incentives are different than those of state politicians because Newsom has national ambitions? So therefore he needs to cater more to the Democratic Party Establishment (which seems to oppose SB1047) or Big Tech? (And then this just balances out against things like “maybe Newsom doesn’t want to seem soft on Big Tech, maybe he feels like he has more to lose by deviating from what the legislature wants, the polls support SB1047, and maybe he actually cares about increasing transparency into frontier AI companies?)
Or are there other factors that are especially influential in peoples’ models here?
(Tagging @ryan_greenblatt, @Eric Neyman, and @Neel Nanda because you three hold the largest No positions. Feel free to ignore if you don’t want to engage.)
My model is basically just “Newsom likely doesn’t want to piss off Big Tech or Pelosi, and the incentive to not veto doesn’t seem that high, and so seems highly likely to veto, and 50% veto seems super low”. My fair is, like, 80% veto I think?
I’m not that compelled by the base rates argument, because I think the level of controversy over the bill is atypically high, so it’s quite out of distribution. Eg I think Pelosi denouncing it is very unusual for a state Bill and a pretty big deal
Thanks for sharing! Why do you think the CA legislators were more OK pissing off Big Tech & Pelosi? (I mean, I guess Pelosi’s statement didn’t come until relatively late, but I believe there was still time for people in at least one chamber to change their votes.)
To me, the most obvious explanation is probably something like “Newsom cares more about a future in federal government than most CA politicians and therefore relies more heavily on support from Big Tech and approval from national Democratic leaders”– is this what’s driving your model?
This is a fair point. I think Newsom is a very visible and prominent target who has more risk here (I imagine people don’t pay that much attention to individual California legislators), it’s individually his fault if he doesn’t veto, and he wants to be President and thus cares much more about national stuff. While the California legislators were probably annoyed at Pelosi butting into state business.
I believe that Pelosi had never once spoken out against a state bill authored by a California Democrat before this.
A financial conflict of interest is a wonderous thing...
For what it’s worth, I don’t have any particular reason to think that that’s the reason for her opposition.
Is there some source that particularly indicates this? I get why the 15% base rate might be low, but haven’t actually seen evidence apart from this Manifold question that it’d be higher.
Newsom’s stance on Big Tech is a bit murky. He pushed ideas like the Data Dividend but overall, he seems pretty friendly to the industry.
As for Pelosi, she’s still super influential, but she’ll be 88 by the next presidential election. Her long-term influence is definitely something to watch and Newsom probably has a good read on how things will shift.
I think this isn’t true. Concretely, I bet that if you looked at the distribution of Democratic No votes among bills that reached Newsom’s desk, this one would be among the highest (7 No votes and a bunch of not-voting, which I think is just a polite way to vote No; source). I haven’t checked and could be wrong!
My take is basically the same as Neel’s, though my all-things-considered guess is that he’s 60% or so to veto. My position on Manifold is in large part an emotional hedge. (Otherwise I would be placing much smaller bets in the same direction.)
I’ve started reading the Report on the International Control of Atomic Energy and am finding it very interesting/useful.
I recommend this for AI policy people– especially those interested in international cooperation, US policy, and/or writing for policy audiences.
@Peter Barnett @Rob Bensinger @habryka @Zvi @davekasten @Peter Wildeford you come to mind as people who might be interested.
See also Wikipedia Page about the report (but IMO reading sections of the actual report is worth it.)
Recommended readings for people interested in evals work?
Someone recently asked: “Suppose someone wants to get into evals work. Is there a good reading list to send to them?” I spent ~5 minutes and put this list together. I’d be interested if people have additional suggestions or recommendations:
I would send them:
Model evaluations for extreme risks
Evaluating frontier models for dangerous capabilities
METR ARA paper
Recent AI Sandbagging paper
Anthropic’s challenges in evaluating AI systems
Apollo’s starter guide for evals
A paper I’m writing on semi-structured interviews as a good complement to formal evaluations (in-progress)
I would also encourage them to read stuff more on the “macrostrategy” of evals. Like, I suspect a lot of value will come from people who are able to understand the broader theory of change of evals and identify when we’re “rowing” in bad directions. Some examples here might be:
How evals might (or might not) prevent catastrophic risks from AI (a bit outdated but still relevant IMO).
Lots of the discussion around RSPs (e.g., RSPs are pauses done right, RSPs are risk management done wrong, OpenAI’s Preparedness Framework: Praise & Recommendations)
A paper I’m writing on emergency preparedness, that includes some thoughts on government’s “detection capabilities” (in-progress).
Six dimensions of operational adequacy (relevant for “what happens when the evals go off”)
Carefully bootstrapped alignment is organizationally hard (also relevant for “what happens when the evals go off”)
I’m obviously biased, but I would recommend my post on macrostrategy of evals: The case for more ambitious language model evals.
@Ryan Kidd @Lee Sharkey I suspect you’ll have useful recommendations here.
I’m interested in writing out somewhat detailed intelligence explosion scenarios. The goal would be to investigate what kinds of tools the US government would have to detect and intervene in the early stages of an intelligence explosion.
If you know anyone who has thought about these kinds of questions, whether from the AI community or from the US government perspective, please feel free to reach out via LessWrong.