Akash

Karma: 4,894

Akash 31 May 2024 15:14 UTC
5 points
0
in reply to: Charbel-Raphaël’s comment on: We might be dropping the ball on Autonomous Replication and Adaptation.
Potentially unpopular take, but if you have the skillset to do so, I’d rather you just come up with simple/clear explanations for why ARA is dangerous, what implications this has for AI policy, present these ideas to policymakers, and iterate on your explanations as you start to see why people are confused.
Note also that in the US, the NTIA has been tasked with making recommendations about open-weight models. The deadline for official submissions has ended but I’m pretty confident that if you had something you wanted them to know, you could just email it to them and they’d take a look. My impression is that they’re broadly aware of extreme risks from certain kinds of open-sourcing but might benefit from (a) clearer explanations of ARA threat models and (b) specific suggestions for what needs to be done.

Akash 31 May 2024 15:10 UTC
9 points
1
on: We might be dropping the ball on Autonomous Replication and Adaptation.
Why do you think we are dropping the ball on ARA?
I think many members of the policy community feel like ARA is “weird” and therefore don’t want to bring it up. It’s much tamer to talk about CBRN threats and bioweapons. It also requires less knowledge and general competence– explaining ARA and autonomous systems risks is difficult, you get more questions, you’re more likely to explain something poorly, etc.
Historically, there was also a fair amount of gatekeeping, where some of the experienced policy people were explicitly discouraging people from being explicit about AGI threat models (this still happens to some degree, but I think the effect is much weaker than it was a year ago.)
With all this in mind, I currently think raising awareness about ARA threat models and AI R&D threat models is one of the most important things for AI comms/policy efforts to get right.
In the status quo, even if the evals go off, I don’t think we have laid the intellectual foundation required for policymakers to understand why the evals are dangerous. “Oh interesting– an AI can make copies of itself? A little weird but I guess we make copies of files all the time, shrug.” or “Oh wow– AI can help with R&D? That’s awesome– seems very exciting for innovation.”
I do think there’s a potential to lay the intellectual foundation before it’s too late, and I think many groups are starting to be more direct/explicit about the “weirder” threat models. Also, I think national security folks have more of a “take things seriously and worry about things even if there isn’t clear empirical evidence yet” mentality than ML people. And I think typical policymakers fall somewhere in between.

Akash 31 May 2024 2:31 UTC
7 points
0
on: Non-Disparagement Canaries for OpenAI
Minor note: Paul is at the US AI Safety Institute, while Jade & Geoffrey are at the UK AI Safety Institute.

Akash 30 May 2024 21:35 UTC
5 points
1
in reply to: Ben Pace’s comment on: MIRI 2024 Communications Strategy
@habryka I think you’re making a claim about whether or not the difference matters (IMO it does) but I perceived @Kaj_Sotala to be making a claim about whether “an average reasonably smart person out in society” would see the difference as meaningful (IMO they would not).
(My guess is you interpreted “reasonable people” to mean like “people who are really into reasoning about the world and trying to figure out the truth” and Kaj interpreted reasonable people to mean like “an average person.” Kaj should feel free to correct me if I’m wrong.)

Akash 30 May 2024 21:30 UTC
5 points
1
in reply to: peterbarnett’s comment on: MIRI 2024 Communications Strategy
My two cents RE particular phrasing:
When talking to US policymakers, I don’t think there’s a big difference between “causes a national security crisis” and “kills literally everyone.” Worth noting that even though many in the AIS community see a big difference between “99% of people die but civilization restarts” vs. “100% of people die”, IMO this distinction does not matter to most policymakers (or at least matters way less to them).
Of course, in addition to conveying “this is a big deal” you need to convey the underlying threat model. There are lots of ways to interpret “AI causes a national security emergency” (e.g., China, military conflict). “Kills literally everyone” probably leads people to envision a narrower set of worlds.
But IMO even “kills literally everybody” doesn’t really convey the underlying misalignment/AI takeover threat model.
So my current recommendation (weakly held) is probably to go with “causes a national security emergency” or “overthrows the US government” and then accept that you have to do some extra work to actually get them to understand the “AGI--> AI takeover--> Lots of people die and we lose control” model.

Akash 30 May 2024 16:15 UTC
8 points
0
in reply to: Gretta Duleba’s comment on: MIRI 2024 Communications Strategy
Valid!

Akash 30 May 2024 16:14 UTC
6 points
3
in reply to: Gretta Duleba’s comment on: MIRI 2024 Communications Strategy
Thanks! Despite the lack of SMART goals, I still feel like this reply gave me a better sense of what your priorities are & how you’ll be assessing success/failure.
One failure mode– which I’m sure is already on your radar– is something like: “MIRI ends up producing lots of high-quality stuff but no one really pays attention. Policymakers and national security people are very busy and often only read things that (a) directly relate to their work or (b) are sent to them by someone who they respect.”
Another is something like: “MIRI ends up focusing too much on making arguments/points that are convincing to general audiences but fail to understand the cruxes/views of the People Who Matter.” (A strawman version of this is something like “MIRI ends up spending a lot of time in the Bay and there’s lots of pressure to engage a bunch with the cruxes/views of rationalists, libertarians, e/accs, and AGI company employees. Meanwhile, the kinds of conversations happening among natsec folks & policymakers look very different, and MIRI’s materials end up being less relevant/useful to this target audience.”
I’m extremely confident that these are already on your radar, but I figure it might be worth noting that these are two of the failure modes I’m most worried about. (I guess besides the general boring failure mode along the lines of “hiring is hard and doing anything is hard and maybe things just stay slow and when someone asks what good materials you guys have produced the answer is still ‘we’re working on it’.)
(Final note: A lot of my questions and thoughts have been critical, but I should note that I appreciate what you’re doing & I’m looking forward to following MIRI’s work in the space! :D)

Akash 30 May 2024 16:06 UTC
4 points
2
in reply to: Gretta Duleba’s comment on: MIRI 2024 Communications Strategy
Thank you! I still find myself most curious about the “how will MIRI make sure it understands its audience” and “how will MIRI make sure its materials are read by policymakers + natsec people” parts of the puzzle. Feel free to ignore this if we’re getting too in the weeds, but I wonder if you can share more details about either of these parts.
There is also an audience-specific component, and to do well on that, we do need to understand our audience better. We are working to recruit beta readers from appropriate audience pools.
There are several approaches here, most of which will not be executed by the comms team directly, we hand off to others

Akash 30 May 2024 16:01 UTC
20 points
11
on: Akash’s Shortform
I’m surprised why some people are so interested in the idea of liability for extreme harms. I understand that from a legal/philosophical perspective, there are some nice arguments about how companies should have to internalize the externalities of their actions etc.
But in practice, I’d be fairly surprised if liability approaches were actually able to provide a meaningful incentive shift for frontier AI developers. My impression is that frontier AI developers already have fairly strong incentives to avoid catastrophes (e.g., it would be horrible for Microsoft if its AI model caused $1B in harms, it would be horrible for Meta and the entire OS movement if an OS model was able to cause $1B in damages.)
And my impression is that most forms of liability would not affect this cost-benefit tradeoff by very much. This is especially true if the liability is only implemented post-catastrophe. Extreme forms of liability could require insurance, but this essentially feels like a roundabout and less effective way of implementing some form of licensing (you have to convince us that risks are below an acceptable threshold to proceed.)
I think liability also has the “added” problem of being quite unpopular, especially among Republicans. It is easy to attack liability regulations as anti-innovation, argue that that it creates a moat (only big companies can afford to comply), and argue that it’s just not how America ends up regulating things (we don’t hold Adobe accountable for someone doing something bad with Photoshop.)
To be clear, I don’t think “something is politically unpopular” should be a full-stop argument against advocating for it.
But I do think that “liability for AI companies” scores poorly both on “actual usefulness if implemented” and “political popularity/feasibility.” I also think the “liability for AI companies” advocacy often ends up getting into abstract philosophy land (to what extent should companies internalize externalities) and ends up avoiding some of the “weirder” points (we expect AI has a considerable chance of posing extreme national security risks, which is why we need to treat AI differently than Photoshop.)
I would rather people just make the direct case that AI poses extreme risks & discuss the direct policy interventions that are warranted.
With this in mind, I’m not an expert in liability and admittedly haven’t been following the discussion in great detail (partly because the little I have seen has not convinced me that this is an approach worth investing into). I’d be interested in hearing more from people who have thought about liability– particularly concrete stories for how liability would be expected to meaningfully shift incentives of labs. (See also here).
Stylistic note: I’d prefer replies along the lines of “here is the specific argument for why liability would significantly affect lab incentives and how it would work in concrete cases” rather than replies along the lines of “here is a thing you can read about the general legal/philosophical arguments about how liability is good.”

Akash 30 May 2024 15:42 UTC
3 points
0
in reply to: Gretta Duleba’s comment on: MIRI 2024 Communications Strategy
the artifacts we’re producing are very big and we want to get them right.
To the extent that this can be shared– What are the artifacts you’re most excited about, and what’s your rough prediction about when they will be ready?
Moreover, how do you plan to assess the success/failure of your projects? Are there any concrete metrics you’re hoping to achieve? What does a “really good outcome” for MIRI’s comms team look like by the end of the year, and what does a “we have failed and need to substantially rethink our approach, speed, or personnel” outcome look like?
(I ask partially because one of my main uncertainties right now is how well MIRI will get its materials in front of the policymakers and national security officials you’re trying to influence. In the absence of concrete goals/benchmarks/timelines, I could imagine a world where MIRI moves at a relatively slow pace, produces high-quality materials with truthful arguments, but this content isn’t getting to the target audience, and the work isn’t being informed by the concerns/views of the target audience.)

Akash 30 May 2024 15:34 UTC
3 points
0
in reply to: Gretta Duleba’s comment on: MIRI 2024 Communications Strategy
Got it– thank you! Am I right in thinking that your team intends to influence policymakers and national security officials, though? If so, I’d be curious to learn more about how you plan to get your materials in front of them or ensure that your materials address their core points of concern/doubt.
Put a bit differently– I feel like it would be important for your team to address these questions insofar as your team has the following goals:
The main audience we want to reach is policymakers – the people in a position to enact the sweeping regulation and policy we want – and their staff.
We are hopeful about reaching a subset of policy advisors who have the skill of thinking clearly and carefully about risk, particularly those with experience in national security.

Akash 30 May 2024 1:04 UTC
26 points
11
on: MIRI 2024 Communications Strategy
Thank you for this update—I appreciate the clear reasoning. I also personally feel that the AI policy community is overinvested in the “say things that will get you points” strategy and underinvested in the “say true things that help people actually understand the problem” strategy. Specifically, I feel like many US policymakers have heard “be scared of AI because of bioweapons” but have not heard clear arguments about risks from autonomous systems, misalignment, AI takeover, etc.
A few questions:
1. To what extent is MIRI’s comms team (or technical governance team) going to interact directly with policymakers and national security officials? (I personally suspect you will be more successful if you’re having regular conversations with your target audience and taking note of what points they find confusing or unconvincing rather than “thinking from first principles” about what points make a sound argument.)
2. To what extent is MIRI going to contribute to concrete policy proposals (e.g., helping offices craft legislation or helping agencies craft specific requests)?
3. To what extent is MIRI going to help flesh out how its policy proposals could be implemented? (e.g., helping iron out the details of what a potential international AI compute governance regime would look like, how it would be implemented, how verification would work, what society would do with the time it buys)
4. Suppose MIRI has an amazing resource about AI risks. How does MIRI expect to get national security folks and important policymakers to engage with it?
(Tagging @lisathiergart in case some of these questions overlap with the work of the technical governance team.)

Akash 29 May 2024 3:32 UTC
3 points
0
on: What mistakes has the AI safety movement made?
6 respondents thought AI safety could communicate better with the wider world. The AI safety community do not articulate the arguments for worrying about AI risk well enough, come across as too extreme or too conciliatory, and lean into some memes too much or not enough.
I think this accurately captures a core debate in AI comms/AI policy at the moment. Some groups are worried about folks coming off as too extreme (e.g., by emphasizing AI takeover and loss-of-control risks) and some groups are worried about folks worrying so much about sounding “normal” that they give an inaccurate or incomplete picture of the risks (e.g., by getting everyone worried about AI-generated bioweapons, even if the speaker does not believe that “malicious use from bioweapons” is the most plausible or concerning threat model.)
My own opinion is that I’m quite worried that some of the “attempts to look normal” have led to misleading/incorrect models of risk. These models of risk (which tend to focus more on malicious use than risks from autonomous systems) do not end up producing reasonable policy efforts.
The tides seem to be changing, though—there have been more efforts to raise awareness about AGI, AGI takeover, risks from autonomous systems, and risks from systems that can produce a decisive strategic advantage. I think these risks are quite important for policymakers to understand, and clear/straightforward explanations of them are rare.
I also think status incentives are discouraging (some) people from raising awareness about these threat models– people don’t want to look silly, dumb, sci-fi, etc. But IMO one of the most important comms/policy challenges will be getting people to take such threat models seriously, and I think there are ways to explain such threat models legitimately.

Akash 27 May 2024 20:21 UTC
2 points
0
on: Maybe Anthropic’s Long-Term Benefit Trust is powerless
Thanks for looking into this! A few basic questions about the Trust:
1. Do we know if trustees can serve multiple terms? See below for a quoted section from Anthropic’s site:
Trustees serve one-year terms and future Trustees will be elected by a vote of the Trustees.
2. Do we know what % of the board is controlled by the trustees, and by when it is expected to be a majority?
The Trust is an independent body of five financially disinterested members with an authority to select and remove a portion of our Board that will grow over time (ultimately, a majority of our Board).
3. Do we know if Paul is still a Trustee, or does his new role at USAISI mean he had to step down?
The initial Trustees are:

Jason Matheny: CEO of the RAND Corporation
Kanika Bahl: CEO & President of Evidence Action
Neil Buddy Shah: CEO of the Clinton Health Access Initiative (Chair)
Paul Christiano: Founder of the Alignment Research Center
Zach Robinson: Interim CEO of Effective Ventures US

Akash 24 May 2024 17:07 UTC
5 points
0
in reply to: DanielFilan’s comment on: Big Picture AI Safety: Introduction
Which of the institutions would you count as AGI labs? (genuinely curious– usually I don’t think about academic labs [relative to like ODA + Meta + Microsoft] but perhaps there are some that I should be counting.)
And yeah, OP funding is a weird metric because there’s a spectrum of how much grantees are closely tied to OP. Like, there’s a wide spectrum from “I have an independent research group and got 5% of my total funding from OP” all the way to like “I get ~all my funding from OP and work in the same office as OP and other OP allies and many of my friends/colleagues are OP etc.”
That’s why I tried to use the phrase “close allies/grantees”, to convey more of this implicit cultural stuff than merely “have you ever received OP $.” My strong impression is that the authors of the paper are much more intellectually/ideologically/culturally independent from OP, relative to the list of 17 interviewees presented above.

Akash 23 May 2024 18:27 UTC
38 points
28
on: Big Picture AI Safety: Introduction
What do AI safety experts believe about the big picture of AI risk?
I would be careful not to implicitly claim that these 17 people are a “representative sample” of the AI safety community. Or, if you do want to make that claim, I think it’s important to say a lot more about how these particular participants were chosen and why you think they are represented.
At first glance, it seems to me like this pool of participants overrepresents some worldviews and under-represents others. For example, it seems like the vast majority of the participants either work for AGI labs, Open Philanthropy, and close allies/grantees of OP. OP undoubtedly funds a lot of AIS groups, but there are lots of experts who approach AIS from a different set of assumptions and worldviews.
More specifically, I’d say this list of 17 experts over-represents what I might refer to as the “Open Phil + AGI labs + people funded by or close to those entities” cluster of thinkers (who IMO generally are more optimistic than folks at groups like MIRI, Conjecture, CAIS, FLI, etc.) & over-represents people who are primarily focused on technical research (who IMO are generally most optimistic about technical alignment, more likely to believe empirical work is better than conceptual work, and more likely to believe in technical rather than socio-technical approaches.)
To be clear– I still think that work like this is & can be important. Also, there is some representation from people outside of the particular subculture I’m claiming is over-represented.
But I think it is very hard to do a survey that actually meaningfully represents the AI safety community, and I think there are a lot of subjective decisions that go into figuring out who counts as an “expert” in the field.

Akash 21 May 2024 20:02 UTC
10 points
4
in reply to: Linch’s comment on: Open Thread Spring 2024
Great point! (Also oops– I forgot that Irving was formerly OpenAI as well. He worked for DeepMind in recent years, but before that he worked at OpenAI and Google Brain.)
Do we have any evidence that DeepMind or Anthropic definitely do not do non-disparagement agreements? (If so then we can just focus on former OpenAI employees.)

Akash 21 May 2024 19:37 UTC
8 points
6
in reply to: Hjalmar_Wijk’s comment on: New voluntary commitments (AI Seoul Summit)
This seems like a solid list. Scaling certainly seems core to the RSP concept.
IMO “red lines, iterative policy updates, and evaluations & accountability” are sort of pointing at the same thing. Roughly something like “we promise not to cross X red line until we have published Y new policy and allowed the public to scrutinize it for Z amount of time.”
One interesting thing here is that none of the current RSPs meet this standard. I suppose the closest is Anthropic’s, where they say they won’t scale to ASL-4 until they publish a new RSP (this would cover “red line” but I don’t believe they commit to giving the public a chance to scrutinize it, so it would only partially meet “iterative policy updates” and wouldn’t meet “evaluations and accountability.”)
They will not suddenly cross any of their red lines before the mitigations are implemented/a new RSP version has been published and given scrutiny, by pointing at specific evaluations procedures and policies
This seems like the meat of an ideal RSP. I don’t think it’s done by any of the existing voluntary scaling commitments. All of them have this flavor of “our company leadership will determine when the mitigations are sufficient, and we do not commit to telling you what our reasoning is.” OpenAI’s PF probably comes the closest, IMO (e.g., leadership will evaluate if the mitigations have moved the model from the stuff described in the “critical risk” category to the stuff described in the “high risk” category.)
As long as the voluntary scaling commitments end up having this flavor of “leadership will make a judgment call based on its best reasoning”, it feels like the commitments lack most of the “teeth” of the kind of RSP you describe.
(So back to the original point– I think we could say that something is only an RSP if it has the “we won’t cross this red line until we give you a new policy and let you scrutinize it and also tell you how we’re going to reason about when our mitigations are sufficient” property, but then none of the existing commitments would qualify as RSPs. If we loosen the definition, then I think we just go back to “these are voluntary commitments that have to do with scaling & how the lab is thinking about risks from scaling.”)

Akash 21 May 2024 17:53 UTC
2 points
0
in reply to: Zach Stein-Perlman’s comment on: Anthropic: Reflections on our Responsible Scaling Policy
on Earth you don’t get sufficient credit for sharing good policies and there’s substantial negative EV from misunderstandings and adversarial interpretations, so I guess it’s often correct to not share :(
What’s the substantial negative EV that would come from misunderstanding or adversarial interpretations? I feel like in this case, worst-case would be like “the non-compliance reporting policy is actually pretty good but a few people say mean things about it and say ‘see, here’s why we need government oversight.’ But this feels pretty minor/trivial IMO.
As an ⁸⁰⁄₂₀ of publishing, maybe you could share a policy with an external auditor who would then publish whether they think it’s good or have concerns. I would feel better if that happened all the time
This is clever, +1.

Akash 21 May 2024 17:41 UTC
8 points
2
in reply to: tylerjohnston’s comment on: New voluntary commitments (AI Seoul Summit)
even if you are skeptical of the value of RSPs, I think you should be in favor of a specific name for it so you can distinguish it from other, future voluntary safety policies that you are more supportive of
This is a great point– consider me convinced. Interestingly, it’s hard for me to really precisely define the things that make something an RSP as opposed to a different type of safety commitment, but there are some patterns in the existing RSP/PF/FSF that do seem to put them in a broader family. (Ex: Strong focus on model evaluations, implicit assumption that AI development should continue until/unless evidence of danger is found, implicit assumption that company executives will decide once safeguards are sufficient).