tlevin

Karma: 363

(Posting in a personal capacity unless stated otherwise.) I help allocate Open Phil’s resources to improve the governance of AI with a focus on avoiding catastrophic outcomes. Formerly co-founder of the Cambridge Boston Alignment Initiative, which supports AI alignment/safety research and outreach programs at Harvard, MIT, and beyond, co-president of Harvard EA, Director of Governance Programs at the Harvard AI Safety Team and MIT AI Alignment, and occasional AI governance researcher.

Not to be confused with the user formerly known as trevor1.

tlevin 30 Apr 2024 21:33 UTC
69 points
0
on: tlevin’s Shortform
I think some of the AI safety policy community has over-indexed on the visual model of the “Overton Window” and under-indexed on alternatives like the “ratchet effect,” “poisoning the well,” “clown attacks,” and other models where proposing radical changes can make you, your allies, and your ideas look unreasonable (edit to add: whereas successfully proposing minor changes achieves hard-to-reverse progress, making ideal policy look more reasonable).
I’m not familiar with a lot of systematic empirical evidence on either side, but it seems to me like the more effective actors in the DC establishment overall are much more in the habit of looking for small wins that are both good in themselves and shrink the size of the ask for their ideal policy than of pushing for their ideal vision and then making concessions. Possibly an ideal ecosystem has both strategies, but it seems possible that at least some versions of “Overton Window-moving” strategies executed in practice have larger negative effects via associating their “side” with unreasonable-sounding ideas in the minds of very bandwidth-constrained policymakers, who strongly lean on signals of credibility and consensus when quickly evaluating policy options, than the positive effects of increasing the odds of ideal policy and improving the framing for non-ideal but pretty good policies.
In theory, the Overton Window model is just a description of what ideas are taken seriously, so it can indeed accommodate backfire effects where you argue for an idea “outside the window” and this actually makes the window narrower. But I think the visual imagery of “windows” actually struggles to accommodate this—when was the last time you tried to open a window and accidentally closed it instead? -- and as a result, people who rely on this model are more likely to underrate these kinds of consequences.
Would be interested in empirical evidence on this question (ideally actual studies from psych, political science, sociology, econ, etc literatures, rather than specific case studies due to reference class tennis type issues).

tlevin 17 May 2024 18:16 UTC
25 points
8
in reply to: habryka’s comment on: William_S’s Shortform
Kelsey Piper now reports: “I have seen the extremely restrictive off-boarding agreement that contains nondisclosure and non-disparagement provisions former OpenAI employees are subject to. It forbids them, for the rest of their lives, from criticizing their former employer. Even acknowledging that the NDA exists is a violation of it.”

tlevin 23 Mar 2024 0:29 UTC
21 points
7
on: On green
One other thought on Green in rationality: you mention the yin of scout mindset in the Deep Atheism post, and scout mindset and indeed correct Bayesianism involves a Green passivity and maybe the “respect for the Other” described here. While Blue is agnostic, in theory, between yin and yang—whichever gives me more knowledge! -- Blue as evoked in Duncan’s post and as I commonly think of it tends to lean yang: “truth-seeking,” “diving down research rabbit holes,” “running experiments,” etc. A common failure mode of Blue-according-to-Blue is a yang that projects the observer into the observations: seeing new evidence as tools, arguments as soldiers. Green reminds Blue to chill: see the Other as it is, recite the litanies of Gendlin and Tarski, combine the seeking of truth with receptivity to what you find.

tlevin 8 Jan 2024 23:17 UTC
14 points
10
in reply to: Akash’s comment on: EU policymakers reach an agreement on the AI Act
(An extra-heavy “personal capacity” disclaimer for the following opinions.) Yeah, I hear you that OP doesn’t have as much public writing about our thinking here as would be ideal for this purpose, though I think the increasingly adversarial environment we’re finding ourselves in limits how transparent we can be without undermining our partners’ work (as we’ve written about previously).
The set of comms/advocacy efforts that I’m personally excited about is definitely larger than the set of comms/advocacy efforts that I think OP should fund, since 1) that’s a higher bar, and 2) sometimes OP isn’t the right funder for a specific project. That being said:
- So far, OP has funded AI policy advocacy efforts by the Institute for Progress and Sam Hammond. I personally don’t have a very detailed sense of how these efforts have been going, but the theory of impact for these was that both grantees have strong track records in communicating policy ideas to key audiences and a solid understanding of the technical and governance problems that policy needs to solve.
- I’m excited about the EU efforts of FLI and The Future Society. In the EU context, it seems like these orgs were complementary, where FLI was willing to take steps (including the pause letter) that sparked public conversation and gave policymakers context that made TFS’s policy conversations more productive (despite raising some controversy). I have much less context on their US work, but from what I know, I respect the policymaker outreach and convening work that they do and think they are net-positive.
- I think CAIP is doing good work so far, though they have less of a track record. I value their thinking about the effectiveness of different policy options, and they seem to be learning and improving quickly.
- I don’t know as much about Andrea and Control AI, but my main current takes about them are that their anti-RSP advocacy should have been heavier on “RSPs are insufficient,” which I agree with, instead of “RSPs are counterproductive safety-washing,” which I think could have disincentivized companies from the very positive move of developing an RSP (as you and I discussed privately a while ago). MAGIC is an interesting and important proposal and worth further developing (though as with many clever acronyms I kind of wish it had been named differently).
- I’m not sure what to think about Holly’s work and PauseAI. I think the open source protest where they gave out copies of a GovAI paper to Meta employees seemed good – that seems like the kind of thing that could start really productive thinking within Meta. Broadly building awareness of AI’s catastrophic potential seems really good, largely for the reasons Holly describes here. Specifically calling for a pause is complicated, both in terms of the goodness of the types of policies that could be called a pause and in terms of the politics (i.e., the public seems pretty on board, but it might backfire specifically with the experts that policymakers will likely defer to, but also it might inspire productive discussion around narrower regulatory proposals?). I think this cluster of activists can sometimes overstate or simplify their claims, which I worry about.
Some broader thoughts about what kinds of advocacy would be useful or not useful:
- The most important thing, imo, is that whatever advocacy you do, you do it well. This sounds obvious, but importantly differs from “find the most important/neglected/tractable kind of advocacy, and then do that as well as you personally can do it.” For example, I’d be really excited about people who have spent a long time in Congress-type DC world doing advocacy that looks like meeting with staffers; I’d be excited about people who might be really good at writing trying to start a successful blog and social media presence; I’d be excited about people with a strong track record in animal advocacy campaigns applying similar techniques to AI policy. Basically I think comparative advantage is really important, especially in cases where the risk of backfire/poisoning the well is high.
- In all of these cases, I think it’s very important to make sure your claims are not just literally accurate but also don’t have misleading implications and are clear about your level of confidence and the strength of the evidence. I’m very, very nervous about getting short-term victories by making bad arguments. Even Congress, not known for its epistemic and scientific rigor, has gotten concerned that AI safety arguments aren’t as rigorous as they need to be (even though I take issue with most of the specific examples they provide).
- Relatedly, I think some of the most useful “advocacy” looks a lot like research: if an idea is currently only legible to people who live and breathe AI alignment, writing it up in a clear and rigorous way, such that academics, policymakers, and the public can interact with it, critique it, and/or become advocates for it themselves is very valuable.
- This is obviously not a novel take, but I think other things equal advocacy should try not to make enemies. It’s really valuable that the issue remain somewhat bipartisan and that we avoid further alienating the AI fairness and bias communities and the mainstream ML community. Unfortunately “other things equal” won’t always hold, and sometimes these come with steep tradeoffs, but I’d be excited about efforts to build these bridges, especially by people who come from/have spent lots of time in the community to which they’re bridge-building.

tlevin 22 May 2023 22:12 UTC
14 points
17
on: I don’t want to talk about AI
I think it’s admirable to say things like “I don’t want to [do the thing that this community holds as near-gospel as a good thing to do.]” I also think the community should take it seriously that anyone feels like they’re punished for being intellectually honest, and in general I’m sad that it seems like your interactions with EAs/rats about AI have been unpleasant.
That said...I do want to push back on basically everything in this post and encourage you and others in this position to spend some time seeing if you agree or disagree with the AI stuff.
- Assuming that you think you’d look into it in a reasonable way, then you’d be much more likely to reach a doomy conclusion if it were actually true. If it were true, it would be very much in your interest — altruistically and personally — to believe it. In general, it’s just pretty useful to have more information about things that could completely transform your life. If you might have a terminal illness, doesn’t it make sense to find out soon so you can act appropriately even if it’s totally untreatable?
- I also think there are many things for non-technical people to do on AI risk! For example, you could start trying to work on the problem, or if you think it’s just totally hopeless w/r/t your own work, you could work less hard and save less for retirement so you can spend more time and money on things you value now.
For the “what if I decide it’s not a big deal conclusion”:
- For points #1 through #3, I’m basically just surprised that you don’t already experience this with the take “I don’t want to learn about or talk about AI” such that it would get worse if your take was “I have a considered view that AI x-risk is low”! To be honest and a little blunt, I do judge people a bit when they have bad reasoning either for high or low levels of x-risk, but I’m pretty sure I judge them a lot more positively when they’ve made a good-faith effort at figuring it out.
- For point #3 and #4, idk, Holden, Joe Carlsmith, Rob Long, and possibly I (among others) are all people who have (hopefully) contributed something valuable to the fight against AI risk with social science or humanities backgrounds, so I don’t think this means you wouldn’t be persuasive, and it seems incredibly valuable for the community if more people think things through and come to this opinion. The consensus that AI safety is a huge deal currently means we have hundreds of millions of dollars, hundreds of people (many of whom are anxious and/or depressed because of this consensus), and dozens of orgs focused on it. Imagine if this is wrong — we’d be inflicting so much damage!

tlevin 9 Dec 2023 18:31 UTC
13 points
4
in reply to: johnswentworth’s comment on: What I Would Do If I Were Working On AI Governance
I broadly share your prioritization of public policy over lab policy, but as I’ve learned more about liability, the more it seems like one or a few labs having solid RSPs/evals commitments/infosec practices/etc would significantly shift how courts make judgments about how much of this kind of work a “reasonable person” would do to mitigate the foreseeable risks. Legal and policy teams in labs will anticipate this and thus really push for compliance with whatever the perceived industry best practice is. (Getting good liability rulings or legislation would multiply this effect.)

tlevin 20 Sep 2023 18:55 UTC
13 points
3
in reply to: MondSemmel’s comment on: Protest against Meta’s irreversible proliferation (Sept 29, San Francisco)
I think the main thing stopping the accelerationists and open source enthusiasts from protesting with 10x as many people is that, whether for good reasons or not, there is much more opposition to AI progress and proliferation than support among the general public. (Admittedly this is probably less true in the Bay Area, but I would be surprised if it was even close to parity there and very surprised if it were 10x.)

tlevin 29 Sep 2023 0:44 UTC
10 points
4
on: Commonsense Good, Creative Good
Just want to plug Josh Greene’s great book Moral Tribes here (disclosure: he’s my former boss). Moral Tribes basically makes the same argument in different/more words: we evolved moral instincts that usually serve us pretty well, and the tricky part is realizing when we’re in a situation that requires us to pull out the heavy-duty philosophical machinery.

tlevin 18 Dec 2022 1:13 UTC
6 points
5
in reply to: shminux’s comment on: Consider working more hours and taking more stimulants
I don’t think this is the right axis on which to evaluate posts. Posts that suggest donating more of your money to charities that save the most lives, causing less animal suffering via your purchases, and considering that AGI might soon end humanity are also “harmful to an average reader” in a similar sense: they inspire some guilt, discomfort, and uncertainty, possibly leading to changes that could easily reduce the reader’s own hedonic welfare.
However—hopefully, at least—the “average reader” on LW/EAF is trying to believe true things and achieve goals like improving the world, and presenting them arguments that they can evaluate for themselves and might help them unlock more of their own potential seems good.
I also think the post is unlikely to be net-negative given the caveats about trying this as an experiment, the different effects on different kinds of work, etc.

tlevin 2 May 2024 1:29 UTC
5 points
−4
in reply to: Akash’s comment on: tlevin’s Shortform
Quick reactions:
1. Re: how over-emphasis on “how radical is my ask” vs “what my target audience might find helpful” and generally the importance of making your case well regardless of how radical it is, that makes sense. Though notably the more radical your proposal is (or more unfamiliar your threat models are), the higher the bar for explaining it well, so these do seem related.
2. Re: more effective actors looking for small wins, I agree that it’s not clear, but yeah, seems like we are likely to get into some reference class tennis here. “A lot of successful organizations that take hard-line positions and (presumably) get a lot of their power/influence from the ideological purity that they possess & communicate”? Maybe, but I think of like, the agriculture lobby, who just sort of quietly make friends with everybody and keep getting 11-figure subsidies every year, in a way that (I think) resulted more from gradual ratcheting than making a huge ask. “Pretty much no group– whether radical or centrist– has had tangible wins” seems wrong in light of the EU AI Act (where I think both a “radical” FLI and a bunch of non-radical orgs were probably important) and the US executive order (I’m not sure which strategy is best credited there, but I think most people would have counted the policies contained within it as “minor asks” relative to licensing, pausing, etc). But yeah I agree that there are groups along the whole spectrum that probably deserve credit.
3. Re: poisoning the well, again, radical-ness and being dumb/uninformed are of course separable but the bar rises the more radical you get, in part because more radical policy asks strongly correlate with more complicated procedural asks; tweaking ECRA is both non-radical and procedurally simple, creating a new agency to license training runs is both outside the DC Overton Window and very procedurally complicated.
4. Re: incentives, I agree that this is a good thing to track, but like, “people who oppose X are incentivized to downplay the reasons to do X” is just a fully general counterargument. Unless you’re talking about financial conflicts of interest, but there are also financial incentives for orgs pursuing a “radical” strategy to downplay boring real-world constraints, as well as social incentives (e.g. on LessWrong IMO) to downplay boring these constraints and cognitive biases against thinking your preferred strategy has big downsides.
5. I agree that the CAIS statement, Hinton leaving Google, and Bengio and Hogarth’s writing have been great. I think that these are all in a highly distinct category from proposing specific actors take specific radical actions (unless I’m misremembering the Hogarth piece). Yudkowsky’s TIME article, on the other hand, definitely counts as an Overton Window move, and I’m surprised that you think this has had net positive effects. I regularly hear “bombing datacenters” as an example of a clearly extreme policy idea, sometimes in a context that sounds like it maybe made the less-radical idea seem more reasonable, but sometimes as evidence that the “doomers” want to do crazy things and we shouldn’t listen to them, and often as evidence that they are at least socially clumsy, don’t understand how politics works, etc, which is related to the things you list as the stuff that actually poisons the well. (I’m confused about the sign of the FLI letter as we’ve discussed.)
6. I’m not sure optimism vs pessimism is a crux, except in very short, like, 3-year timelines. It’s true that optimists are more likely to value small wins, so I guess narrowly I agree that a ratchet strategy looks strictly better for optimists, but if you think big radical changes are needed, the question remains of whether you’re more likely to get there via asking for the radical change now or looking for smaller wins to build on over time. If there simply isn’t time to build on these wins, then yes, better to take a 2% shot at the policy that you actually think will work; but even in 5-year timelines I think you’re better positioned to get what you ultimately want by 2029 if you get a little bit of what you want in 2024 and 2026 (ideally while other groups also make clear cases for the threat models and develop the policy asks, etc.). Another piece this overlooks is the information and infrastructure built by the minor policy changes. A big part of the argument for the reporting requirements in the EO was that there is now going to be an office in the US government that is in the business of collecting critical information about frontier AI models and figuring out how to synthesize it to the rest of government, that has the legal authority to do this, and both the office and the legal authority can now be expanded rather than created, and there will now be lots of individuals who are experienced in dealing with this information in the government context, and it will seem natural that the government should know this information. I think if we had only been developing and advocating for ideal policy, this would not have happened (though I imagine that this is not in fact what you’re suggesting the community do!).

tlevin 18 Mar 2024 20:59 UTC
4 points
0
on: The Worst Form Of Government (Except For Everything Else We’ve Tried)
I think this post aims at an important and true thing and misses in a subtle and interesting but important way.
Namely: I don’t think the important thing is that one faction gets a veto. I think it’s that you just need limitations on what the government can do that ensure that it isn’t too exploitative/extractive. One way of creating these kinds of limitations is creating lots of veto points and coming up with various ways to make sure that different factions hold the different veto points. But, as other commenters have noted, the UK government does not have structural checks and balances. In my understanding, what they have instead is a bizarrely, miraculously strong respect for precedent and consensus about what “is constitutional” despite (or maybe because of?) the lack of a written constitution. For the UK, and maybe other, less-established democracies (i.e. all of them), I’m tempted to attribute this to the “repeated game” nature of politics: when your democracy has been around long enough, you come to expect that you and the other faction will share power (roughly at 50-50 for median voter theorem reasons), so voices within your own faction start saying “well, hold on, we actually do want to keep the norms around.”
Also, re: the electoral college, can you say more about how this creates de facto vetos? The electoral college does not create checks and balances; you can win in the electoral college without appealing to all the big factions (indeed, see Abraham Lincoln’s 1860 win), and the electoral college puts no restraints on the behavior of the president afterward. It just noisily empowers states that happen to have factional mixes close to the national average, and indeed can create paths to victory that route through doubling down on support within your own faction while alienating those outside it (e.g. Trump’s 2016 and 2020 coalitions).

tlevin 9 Jan 2023 1:08 UTC
4 points
0
on: Staring into the abyss as a core life skill
This post has already helped me admit that I needed to accept defeat and let go of a large project in a way that I think might lead to its salvaging by others—thanks for writing.

tlevin 8 Jan 2024 19:29 UTC
3 points
4
in reply to: Rafael Harth’s comment on: AI Risk and the US Presidential Candidates
Just being “on board with AGI worry” is so far from sufficient to taking useful actions to reduce the risk that I think epistemics and judgment is more important, especially since we’re likely to get lots of evidence (one way or another) about the timelines and risks posed by AI during the term of the next president.

tlevin 8 Jan 2024 19:26 UTC
3 points
2
in reply to: mrtreasure’s comment on: AI Risk and the US Presidential Candidates
He has also broadly indicated that he would be hostile to the nonpartisan federal bureaucracy, e.g. by designating way more of them as presidential appointees, allowing him personally to fire and replace them. I think creating new offices that are effectively set up to regulate AI looks much more challenging in a Trump (and to some extent DeSantis) presidency than the other candidates.

tlevin 29 Sep 2023 1:01 UTC
3 points
1
in reply to: JBlack’s comment on: Weighing Animal Worth
“We should be devoting almost all of global production...” and “we must help them increase” are only the case if:
1. There are no other species whose product of [moral weight] * [population] is higher than bees, and
2. Our actions only have moral relevance for beings that are currently alive.
(And, you know, total utilitarianism and such.)

tlevin 12 Feb 2023 21:51 UTC
3 points
0
in reply to: Seth Herd’s comment on: Many AI governance proposals have a tradeoff between usefulness and feasibility
It seems to me like government-enforced standards are just another case of this tradeoff—they are quite a bit more useful, in the sense of carrying the force of law and applying to all players on a non-voluntary basis, and harder to implement, due to the attention of legislators being elsewhere, the likelihood that a good proposal gets turned into something bad during the legislative process, and the opportunity cost of the political capital.

tlevin 5 Dec 2022 23:20 UTC
2 points
0
in reply to: Ryan Kidd’s comment on: Probably good projects for the AI safety ecosystem
Quick note on 2: CBAI is pretty concerned about our winter ML bootcamp attracting bad-faith applicants and plan to use a combo of AGISF and references to filter pretty aggressively for alignment interest. Somewhat problematic in the medium term if people find out they can get free ML upskilling by successfully feigning interest in alignment, though...

tlevin 26 Mar 2024 22:45 UTC
1 point
0
in reply to: Arjun Panickssery’s comment on: The Worst Form Of Government (Except For Everything Else We’ve Tried)
The “highly concentrated elite” issue seems like it makes it more, rather than less, surprising and noteworthy that a lack of structural checks and balances has resulted in a highly stable and (relatively) individual-rights-respecting set of policy outcomes. That is, it seems like there would thus be an especially strong case for various non-elite groups to have explicit veto power.

tlevin 20 Dec 2023 23:00 UTC
1 point
0
in reply to: Akash’s comment on: EU policymakers reach an agreement on the AI Act
Thanks for these thoughts! I agree that advocacy and communications is an important part of the story here, and I’m glad for you to have added some detail on that with your comment. I’m also sympathetic to the claim that serious thought about “ambitious comms/advocacy” is especially neglected within the community, though I think it’s far from clear that the effort that went into the policy research that identified these solutions or work on the ground in Brussels should have been shifted at the margin to the kinds of public communications you mention.
I also think Open Phil’s strategy is pretty bullish on supporting comms and advocacy work, but it has taken us a while to acquire the staff capacity to gain context on those opportunities and begin funding them, and perhaps there are specific opportunities that you’re more excited about than we are.
For what it’s worth, I didn’t seek significant outside input while writing this post and think that’s fine (given the alternative of writing it quickly, posting it here, disclaiming my non-expertise, and getting additional perspectives and context from commenters like yourself). However, I have spoken with about a dozen people working on AI policy in Europe over the last couple months (including one of the people whose public comms efforts are linked in your comment) and would love to chat with more people with experience doing policy/politics/comms work in the EU.
We could definitely use more help thinking about this stuff, and I encourage readers who are interested in contributing to OP’s thinking on advocacy and comms to do any of the following:
- Write up these critiques (we do read the forums!);
- Join our team (our latest hiring round specifically mentioned US policy advocacy as a specialization we’d be excited about, but people with advocacy/politics/comms backgrounds more generally could also be very useful, and while the round is now closed, we may still review general applications); and/or
- Introduce yourself via the form mentioned in this post.

tlevin 17 Dec 2023 19:06 UTC
1 point
0
in reply to: Sherrinford’s comment on: EU policymakers reach an agreement on the AI Act
Thank you! Classic American mistake on my part to round these institutions to their closest US analogies.