AI grantmaker at Longview Philanthropy and AI DPhil student at Oxford
aog
Downvoted for overstating the findings. It’s neither confirmed (see Manifold) nor necessarily “for instrumental reasons” (diagnosing the causes of model behavior is difficult, and they don’t provide justification for their claimed cause, nor even a clear definition).
I think it’s noteworthy and would love more information about what happened, but it’s worth being careful about the interpretation. I think this would be a good post if you revised these claims.
Anton Leicht has a good extension of this model, saying that if neither voters nor donors prioritize an issue, then policy can be shaped by technocratic experts. Right after ChatGPT’s release, AI policy briefly looked like that. People wanted some kind of policy response, and the only people with detailed proposals were AI safety experts, so they set the agenda for a while (e.g. UK AI Safety Summit, international AISI network, voluntary corporate commitments to mitigate catastrophic risk, Biden EO). But that window was short. Once businesses saw that AI policy proposals could hurt their interests (e.g. during the SB 1047 debate or when discussing restrictions on open source), they stepped in. Now anti-regulation donors generally have the upper hand over the AI policy wonks, and it’ll take real pressure from voters or pro-safety donors to pass policies that the business interests oppose.
One reason is that hosting data centers can give countries political influence over AI development, increasing the importance of their governments having reasonable views on AI risks.
I think it will probably be easy for intelligence agencies on either side to get an estimate within +/- 5% which is sufficent for a minimal version of this.
That seems plausible or even likely but I think it’s hard to be confident. Public estimates of chip production are high variance. For example, SemiAnalysis and Bloomberg’s estimates of Huawei’s 2025 chip production differ by 50% [1]. I doubt either US or Chinese intelligence will be better at estimating these numbers than independent researchers within the next few years. Maybe the true numbers would become clear if governments want them to be, but I could also imagine China rationally not trusting documentation of US chip production created and shared by US chip manufacturers, and vice versa. This is especially hard under time pressure.
The more chips you want to destroy, the more confidence you need. If you want to destroy 90% of your adversary’s chip stock, but you fail to count 5% of their stock, you’ll leave them with 10% + 5% of their original stock, or 50% more than intended.
The main upshot imo is that it’s important for both independent researchers and intelligence agencies to precisely track the chip supply.
[1] Figure 5: https://ifp.org/should-the-us-sell-hopper-chips-to-china/
This does assume both countries have a good sense of the total number of chips controlled by each country’s developers, which seems doable but not trivial and worth working towards now.
We’ve done a number of wargames of this sort of regime and the regime often breaks down.
I’d be curious to hear how it breaks down.
I’d be very interested to see more and older models evaluated on this benchmark. There might be a strong correlation between the amount of RLVR in a model’s training and its propensity to reward hack. In at least some of your setups, GPT-4.1 cheats a lot less than more recent models. I wonder how much that’s caused by it being less capable vs. less motivated to cheat vs. other factors.
Donations to US political campaigns are legally required to be publicly disclosed, whereas donations to US 501c3 nonprofits and 501c4 policy advocacy organizations are not legally required to be publicly disclosed and can be kept private.
I’m a grantmaker at Longview. I agree there isn’t great public evidence that we’re doing useful work. I’d be happy to share a lot more information about our work with people who are strongly considering donating >$100K to AI safety or closely advising people who might do that.
He also titled his review “An Effective Altruism Take on IABIED” on LinkedIn. Given that Zach is the CEO of Centre for Effective Altruism, some readers might reasonably interpret this as Zach speaking for the EA community. Retitling the post to “Book Review: IABIED” or something else seems better.
Agreed with the other answers on the reasons why there’s no GiveWell for AI safety. But in case it’s helpful, I should say that Longview Philanthropy offers advice to donors looking to give >$100K per year to AI safety. Our methodology is a bit different from GiveWell’s, but we do use cost-effectiveness estimates. We investigate funding opportunities across the AI landscape from technical research to field-building to policy in the US, EU, and around the world, trying to find the most impactful opportunities for the marginal donor. We also do active grantmaking, such as our calls for proposals on hardware-enabled mechanismsand digital sentience. More details here. Feel free to reach out to aidan@longview.org or simran@longview.org if you’d like to learn more.
Knowing the TAM would clearly be useful for deciding whether or not to continue investing in compute scaling, but trying to estimate the TAM ahead of time is very speculative, whereas the revenues from yesterday’s investments can be observed before deciding whether to invest today for more revenue tomorrow. Therefore I think investment decisions will be driven in part by revenues, and that people trying to forecast future investment decisions should make forecasts about future revenues, so that we can track whether those revenue forecasts are on track and what that implies for future investment forecasts.
I haven’t done the revenue analysis myself, but I’d love to read something good on the revenue needed to justify different datacenter investments, and whether the companies are on track to hit that revenue.
But by 2030 we would get to $770bn, which probably can’t actually happen if AI doesn’t cross enough capability thresholds by then.
What revenue and growth rate of revenue do you think would be needed to justify this investment? Has there been any good analysis of this question?
Digital sentience funding opportunities: Support for applied work and research
Thanks for the heads up. I’ve edited the title and introduction to better indicate that this content might be interesting to someone even if they’re not looking for funding.
Research Priorities for Hardware-Enabled Mechanisms (HEMs)
Yeah I think that’d be reasonable too. You could talk about these clusters at many different levels of granularity, and there are tons I haven’t named.
If we can put aside for a moment the question of whether Matthew Barnett has good takes, I think it’s worth noting that this reaction reminds me of how outsiders sometimes feel about effective altruism or rationalism:
I guess I feel that his posts tend to be framed in a really strange way such that, even though there’s often some really good research there, it’s more likely to confuse the average reader than anything else and even if you can untangle the frames, I usually don’t find worth it the time.
The root cause may be that there is too much inferential distance, too many differences of basic worldview assumptions, to easily have a productive conversation. The argument contained in any given post might rely on background assumptions that would take a long time to explain and debate. It can be very difficult to have a productive conversation with someone who doesn’t share your basic worldview. That’s one of the reasons that LessWrong encourages users to read foundational material on rationalism before commenting or posting. It’s also why scalable oversight researchers like having places to talk to each other about the best approaches to LLM-assisted reward generation, without needing to justify each time whether that strategy is doomed from the start. And it’s part of why I think it’s useful to create scenes that operate on different worldview assumptions: it’s worth working out the implications of specific beliefs without needing to justify those beliefs each time.
Of course, this doesn’t mean that Matthew Barnett has good takes. Maybe you find his posts confusing not because of inferential distance, but because they’re illogical and wrong. Personally I think they’re good, and I wouldn’t have written this post if I didn’t. But I haven’t actually argued that here, and I don’t really want to—that’s better done in the comments on his posts.
Longview is hiring an AI Safety Content Specialist (contractor, remote, $40–75/hr)
We’re looking for someone to write the grant recommendations, memos, and explainers that help us move money to AI safety. You’ll basically work to understand what we’re funding and why, and write it up for donors. Some donors are just learning about AI risk, but many are extremely high context on AI safety and will notice if you get a technical detail wrong, so it’s important to get someone who can write excellent, high-fidelity content. We want someone with strong AI safety knowledge who writes well and fast, including same-day turnarounds.
It’s an hourly contract, initially for three months, 20+ hrs/week preferred, with potential to extend or convert to full-time. I think it’s a highly directly impactful role and a good learning opportunity for people with good prior knowledge of AI safety. Apply here by EOD Monday March 16.