Computer science master’s student interested in AI and AI safety.
Stephen McAleese
This sounds more or less correct to me. Open Philanthropy (Open Phil) is the largest AI safety grant maker and spent over $70 million on AI safety grants in 2022 whereas LTFF only spent ~$5 million. In 2022, the median Open Phil AI safety grant was $239k whereas the median LTFF AI safety grant was only $19k in 2022.
Open Phil and LTFF made 53 and 135 AI safety grants respectively in 2022. This means the average Open Phil AI safety grant in 2022 was ~$1.3 million whereas the average LTFF AI safety grant was only $38k. So the average Open Phil AI safety grant is ~30 times larger than the average LTFF grant.
These calculations imply that Open Phil and LTFF make a similar number of grants (LTFF actually makes more) and that Open Phil spends much more simply because its grants tend to be much larger (~30x larger). So it seems like funds may be more constrained by their ability to evaluate and fulfill grants rather than having a lack of funding. This is not surprising given that the LTFF grantmakers apparently work part-time.
Counterintuitively, it may be easier for an organization (e.g. Redwood Research) to get a $1 million grant from Open Phil than it is for an individual to get a $10k grant from LTFF. The reason why is that both grants probably require a similar amount of administrative effort and a well-known organization is probably more likely to be trusted to use the money well than an individual so the decision is easier to make. This example illustrates how decision-making and grant-making processes are probably just as important as the total amount of money available.
LTFF specifically could be funding-constrained though given that it only spends ~$5 million per year on AI safety grants. Since ~40% of LTFF’s funding comes from Open Phil and Open Phil has much more money than LTFF, one solution is for LTFF to simply ask for more money from Open Phil.
I don’t know why Open Phil spends so much more on AI safety than LTFF (~14x more). Maybe it’s simply because of some administrative hurdles that LTFF has when requesting money from Open Phil or maybe Open Phil would rather make grants directly.
Here is a spreadsheet comparing how much Open Phil, LTFF, and the Survival and Flourishing Fund (SFF) spend on AI safety per year.
Plug: I recently published a long post on the EA Forum on AI safety funding: An Overview of the AI Safety Funding Situation.
GPT-4 is the model that has been trained with the most training compute which suggests that compute is the most important factor for capabilities. If that wasn’t true, we would see some other company training models with more compute but worse performance which doesn’t seem to be happening.
Thanks for the post. I think it’s a valuable exercise to think about how AI safety could be accelerated with unlimited money.
I think the Manhattan Project idea is interesting but I see some problems with the analogy:
The Manhattan Project was originally a military project and to this day, the military is primarily funded and managed by the government. But most progress in AI today is made by companies such as OpenAI and Google and universities like the University of Toronto. I think a more relevant project is CERN because it’s more recent and focused on the non-military development of science.
The Manhattan Project happened a long time ago and the world has changed a lot since then. The wealth and influence of tech companies and universities is probably much greater today than it was then.
It’s not obvious that a highly centralized effort is needed. The Alignment Forum, open source developers, and the academic research community (e.g. the ML research community) are examples of decentralized research communities that seem to be highly effective at making progress. This probably wasn’t possible in the past because the internet didn’t exist.
I highly doubt that it’s possible to recreate the Bay Area culture in a top-down way. I’m pretty sure China has tried this and I don’t think they’ve succeeded.
Also, I think your description is overemphasizing the importance of geniuses like Von Neumann because 130,000 other people worked on the Manhattan Project too. I think something similar has happened at Google today where Jeff Dean is revered but in reality, I think most progress at Google is done by the tens of thousands of the smart but not genius dark matter developers there.
Anyway, let’s assume that we have a giant AI alignment project that would cost billions. To fund this, we could:
Expand EA funding substantially using community building.
Ask the government to fund the project.
The government has a lot of money but it seems challenging to convince the government to fund AI alignment compared to getting funding from EA. So maybe some EAs with government expertise could work with the government to increase AI safety investment.
If the AI safety project gets EA funding, I think it needs to be cost-effective. The reality is that only ~12% of Open Phil’s money is spent on AI safety. The reason why is that there is a triage situation with other cause areas like biosecurity, farm animal welfare, and global health and development so the goal is to find cost-effective ways to spend money on AI safety. The project needs to be competitive and has more value on the margin than other proposals.
In my opinion, the government projects that are most likely to succeed are those that build on or are similar to recent successful projects and are in the Overton window. For example:
AI Centres for Doctoral Training in the UK: funding PhD students in the UK to work on AI projects such as AI safety.
The NSF Safe Learning-Enabled Systems: US government funding for academic research groups and non-profits to work on AI safety.
My guess is that leveraging academia would be effective and scalable because you can build on the pre-existing talent, leadership, culture, and infrastructure. Alternatively, governments could create new regulations or laws to influence the behavior of companies (e.g. GDPR). Or they could found new think tanks or research institutes possibly in collaboration with universities or companies.
As for the school ideas, I’ve heard that Lee Sedol went to a Go school and as you mentioned, Soviet chess was fueled by Soviet chess programs. China has intensive sports schools but I doubt these kinds of schools would be considered acceptable in Western countries which is an important consideration given that most of AI safety work happens in Western countries like the US and UK.
In science fiction, there are even more extreme programs like the Spartan program in Halo where children were kidnapped and turned into super soldiers, or Star Wars where clone soldiers were grown and trained in special facilities.
I don’t think these kinds of extreme programs would work. Advanced technologies like human cloning could take decades to develop and are illegal in many countries. Also, they sound highly unethical which is a major barrier to their success in modern developed countries like the US and especially EA-adjacent communities like AI safety.
I think a more realistic idea is something like the Atlas Fellowship or SERI MATS which are voluntary programs for aspiring researchers in their teens or twenties.
The geniuses I know of that were trained from an early age in Western-style countries are Mozart (music), Von Neumann (math), John Stuart Mill (philosophy), and Judit Polgár (chess). In all these cases, they were gifted children who lived in normal nuclear families and had ambitious parents and extra tutoring.
In my opinion, much of the value of interpretability is not related to AI alignment but to AI capabilities evaluations instead.
For example, the Othello paper shows that a transformer trained on the next-word prediction of Othello moves learns a world model of the board rather than just statistics of the training text. This knowledge is useful because it suggests that transformer language models are more capable than they might initially seem.
Thanks for the post! It was a good read. One point I don’t think was brought up is the fact that chess is turn-based whereas real life is continuous.
Consequently, the huge speed advantage that AIs have is not that useful in chess because the AI still has to wait for you to make a move before it can move.
But since real life is continuous, if the AI is much faster than you, it could make 1000 ‘moves’ for every move you make and therefore speed is a much bigger advantage in real life.
Great post. I also fear that it may not be socially acceptable for AI researchers to talk about the long-term effects of AI despite the fact that, because of exponential progress, most of the impact of AI will probably occur in the long term.
I think it’s important that AI safety and considerations related to AGI become mainstream in the field of AI because it could be dangerous if the people building AGI are not safety-conscious.
I want a world where the people building AGI are also safety researchers rather than one where the AI researchers aren’t thinking about safety and the safety people are shouting over the wall and asking them to build safe AI.
This idea reminds me of how software development and operations were combined into the DevOps role in software companies.
Context of the post: funding overhang
The post was written in 2021 and argued that there was a funding overhang in longtermist causes (e.g. AI safety) because the amount of funding had grown faster than the number of people working.
The amount of committed capital increased by ~37% per year and the amount of deployed funds increased by ~21% per year since 2015 whereas the number of engaged EAs only grew ~14% per year.
The introduction of the FTX Future Fund around 2022 caused a major increase in longtermist funding which further increased the funding overhang.
Benjamin linked a Twitter update in August 2022 saying that the total committed capital was down by half because of a stock market and crypto crash. Then FTX went bankrupt a few months later.
The current situation
The FTX Future Fund no longer exists and Open Phil AI safety spending seems to have been mostly flat for the past 2 years. The post mentions that Open Phil is doing this to evaluate impact and increase capacity before possibly scaling more.
My understanding (based on this spreadsheet) is that the current level of AI safety funding has been roughly the same for the past 2 years whereas the number of AI safety organizations and researchers has been increasing by ~15% and ~30% per year respectively. So the funding overhang could be gone by now or there could even be a funding underhang.
Comparing talent vs funding
The post compares talent and funding in two ways:
The lifetime value of a researcher (e.g. $5 million) vs total committed funding (e.g. $1 billion)
The annual cost of a researcher (e.g. $100k) vs annual deployed funding (e.g. $100 million)
A funding overhang occurs when the total committed funding is greater than the lifetime value of all the researchers or the annual amount of funding that could be deployed per year is greater than the annual cost of all researchers.
Then the post says:
“Personally, if given the choice between finding an extra person for one of these roles who’s a good fit or someone donating $X million per year, to think the two options were similarly valuable, X would typically need to be over three, and often over 10 (where this hugely depends on fit and the circumstances).”
I forgot to mention that this statement was applied to leadership roles like research leads, entrepreneurs, and grantmakers who can deploy large amounts of funds or have a large impact and therefore can have a large amount of value. Ordinary employees probably have less financial value.
Assuming there is no funding overhang in AI safety anymore, the marginal value of funding over more researchers is higher today than it was when the post was written.
The future
If total AI safety funding does not increase much in the near term, AI safety could continue to be funding-constrained or become more funding constrained as the number of people interested in working on AI safety increases.
However, the post explains some arguments for expecting EA funding to increase:
There’s some evidence that Open Philanthropy plans to scale up its spending over the next several years. For example, this post says, “We gave away over $400 million in 2021. We aim to double that number this year, and triple it by 2025”. Though the post was written in 2022 so it could be overoptimistic.
According to Metaculus, there is a ~50% chance of another Good Ventures / Open Philanthropy-sized fund being created by 2026 which could substantially increase funding for AI safety.
My mildly optimistic guess is that as AI safety becomes more mainstream there will be a symmetrical effect where both more talent and funding are attracted to the field.
Wow, this is an incredible achievement given how AI safety is still a relatively small field. For example, this post by 80,000 hours said that $10 - $50 million was spent globally on AI safety in 2020 according to The Precipice. Therefore this grant is roughly equivalent to an entire year of global AI safety funding!
Thanks for the post! I think it does a good job of describing key challenges in AI field-building and funding.
The talent gap section describes a lack of positions in industry organizations and independent research groups such as SERI MATS. However, there doesn’t seem to be much content on the state of academic AI safety research groups. So I’d like to emphasize the current and potential importance of academia for doing AI safety research and absorbing talent. The 80,000 Hours AI risk page says that there are several academic groups working on AI safety including the Algorithmic Alignment Group at MIT, CHAI in Berkeley, the NYU Alignment Research Group, and David Krueger’s group in Cambridge.
The AI field as a whole is already much larger than the AI safety field so I think analyzing the AI field is useful from a field-building perspective. For example, about 60,000 researchers attended AI conferences worldwide in 2022. There’s an excellent report on the state of AI research called Measuring Trends in Artificial Intelligence. The report says that most AI publications come from the ‘education’ sector which is probably mostly universities. 75% of AI publications come from the education sector and the rest are published by non-profits, industry, and governments. Surprisingly, the top 9 institutions by annual AI publication count are all Chinese universities and MIT is in 10th place. Though the US and industry are still far ahead in ‘significant’ or state-of-the-art ML systems such as PaLM and GPT-4.
What about the demographics of AI conference attendees? At NeurIPS 2021, the top institutions by publication count were Google, Stanford, MIT, CMU, UC Berkeley, and Microsoft which shows that both industry and academia play a large role in publishing papers at AI conferences.
Another way to get an idea of where people work in the AI field is to find out where AI PhD students go after graduating in the US. The number of AI PhD students going to industry jobs has increased over the past several years and 65% of PhD students now go into industry but 28% still go into academic jobs.
Only a few academic groups seem to be working on AI safety and many of the groups working on it are at highly selective universities but AI safety could become more popular in academia in the near future. And if the breakdown of contributions and demographics of AI safety will be like AI in general, then we should expect academia to play a major role in AI safety in the future. Long-term AI safety may actually be more academic than AI since universities are the largest contributor to basic research whereas industry is the largest contributor to applied research.
So in addition to founding an industry org or facilitating independent research, another path to field-building is to increase the representation of AI safety in academia by founding a new research group though this path may only be tractable for professors.
I highly recommend this interview with Yann LeCun which describes his view on self-driving cars and AGI.
Basically, he thinks that self-driving cars are possible with today’s AI but would require immense amounts of engineering (e.g. hard-wired behavior for corner cases) because today’s AI (e.g. CNNs) tends to be brittle and lacks an understanding of the world.
My understanding is that Yann thinks we basically need AGI to solve autonomous driving in a reliable and satisfying way because the car would need to understand the world like a human to drive reliably.
Thanks for writing the paper! I think it will be really impactful and I think it fills a big gap in the literature.
I’ve always wondered what problems RLHF had and mostly I’ve seen only short informal answers about how it incentivizes deception or how humans can’t provide a scalable signal for superhuman tasks which is odd because it’s one of the most commonly used AI alignment methods.
Before your paper, I think this post was the most in-depth analysis of problems with RLHF I’ve seen so I think your paper is now probably the best resource for problems with RLHF. Apart from that post, the List of Lethalities post has a few related sections and this post by John Wentworth has a section on RLHF.
I’m sure your paper will spark future research on improving RLHF because it lists several specific discrete problems that could be tackled!
I’m not sure about software engineering as a whole but can I see AI making programming obsolete.
it will move up to the next level of abstraction and continue from there
My worry is that the next level of abstraction above Python is plain english and that anyone will be able to write programs just by asking “Write an app that does X” except they’ll ask the AI that instead of asking a freelance developer.
The historical trend has been that programming becomes easier. But maybe programming will become so easy that everyone can do programming and programmers won’t be needed anymore.
A historical analogy is search which used to be a skilled job that was done by librarians and involved creating logical queries using keywords (e.g. ‘house’ AND ‘car’). Now natural language language search makes it possible for anyone to use Google and we don’t need librarians for search anymore.
The same could happen to programming. Like librarians for search, it seems like programmers are a middleman between the user requesting a feature and the finished software. Historically programming computers has been too difficult for average people but that might not be true for long.
Since this seems to be Carn’s first post on LessWrong, I think some of the other readers should have been more lenient and not downvoted the post or explained why they downvoted the post.
I would only downvote a post if it was obviously bad, flawed, very poorly written, or a troll post.
This post contains lots of interesting ideas and seems like a good first post.
The original post “Reward is not the optimization target” has 216 upvotes and this one has 0. While the original post was written better, I’m skeptical of the main idea and it’s good to see a post countering it so I’m upvoting this post.
I think AI alignment is solvable for the same reason AGI is solvable: humans are an existence-proof for both alignment and general intelligence.
My summary of the podcast
Introduction
The superalignment team is OpenAI’s new team, co-led by Jan Leike and Ilya Sutskever, for solving alignment for superintelligent AI. One of their goals is to create a roughly human-level automated alignment researcher. The idea is that creating an automated alignment researcher is a kind of minimum viable alignment project where aligning it is much easier than aligning a more advanced superintelligent sovereign-style system. If the automated alignment researcher creates a solution to aligning superintelligence, OpenAI can use that solution for aligning superintelligence.
The automated alignment researcher
The automated alignment researcher is expected to be used for running and evaluating ML experiments, suggesting research directions, and helping with explaining conceptual ideas. The automated researcher needs two components: a model capable enough to do alignment research, which will probably be some kind of advanced language model, and alignment for the model. Initially, they’ll probably start with relatively weak systems with weak alignment methods like RLHF and then scale up both using a bootstrapping approach where the model increases alignment and then the model can be scaled as it becomes more aligned. The end goal is to be able to convert compute into alignment research so that alignment research can be accelerated drastically. For example, if 99% of tasks were automated, research would be ~100 times faster.
The superalignment team
The automated alignment researcher will be built by the superalignment team which currently has 20 people and could have 30 people by the end of the year. OpenAI also plans to allocate 20% of its compute to the superalignment team. Apart from creating the automated alignment researcher, the superalignment team will continue doing research on feedback-based approaches and scalable oversight. Jan emphasizes that the superalignment team will still be needed even if there is an automated alignment researcher because he wants to keep humans in the loop. He also wants to avoid the risk of creating models that seek power, self-improve, deceive human overseers, or exfiltrate (escape).
Why Jan is optimistic
Jan is generally optimistic about the plan succeeding and estimates that it has an ~80% chance of succeeding even though Manifold only gives the project a 22% chance of success. He gives 5 reasons for being optimistic:
LLMs understand human intentions and morality much better than other kinds of agents such as RL game-playing agents. For example, often you can simply ask them to behave a certain way.
Seeing how well RLHF worked. For example, training agents to play Atari games works almost as well as using the reward signal. RLHF-aligned LLMs are much more aligned than base models.
It’s possible to iterate and improve alignment solutions using experiments and randomized controlled trials.
Evaluating research is easier than generating it.
The last reason is a bet on language models. Jan thinks many alignment tasks can be formulated as text-in-text-out tasks.
Controversial statements/criticisms
Jan criticizing interpretability research:
I think interpretability is neither necessary, nor sufficient. I think there is a good chance that we could solve alignment purely behaviorally without actually understanding the models internally. And I think, also, it’s not sufficient where: if you solved interpretability, I don’t really have a good story of how that would solve superintelligence alignment, but I also think that any amount of non-trivial insight we can gain from interpretability will be super useful or could potentially be super useful because it gives us an avenue of attack.
Jan criticizing alignment theory research:
I think there’s actually a lot more scope for theory work than people are currently doing. And so I think for example, scalable oversight is actually a domain where you can do meaningful theory work, and you can say non-trivial things. I think generalization is probably also something where you can say… formally using math, you can make statements about what’s going on (although I think in a somewhat more limited sense). And I think historically there’s been a whole bunch of theory work in the alignment community, but very little was actually targeted at the empirical approaches we tend to be really excited [about] now. And it’s also a lot of… Theoretical work is generally hard because you have to… you’re usually either in the regime where it’s too hard to say anything meaningful, or the result requires a bunch of assumptions that don’t hold in practice. But I would love to see more people just try … And then at the very least, they’ll be good at evaluating the automated alignment researcher trying to do it.
Ideas for complementary research
Jan also gives some ideas that would be complementary to OpenAI’s alignment research agenda:
Creating mechanisms for eliciting values from society.
Solving current problems like hallucinations, jailbreaking, or mode-collapse (repetition) in RLHF-trained models.
Improving model evaluations and evals to measure capabilities and alignment.
Reward model interpretability.
Figuring out how models generalize. For example, figuring out how to generalize alignment for easy-to-supervised tasks to hard tasks.
Based on what I’ve written here, my verdict is that AI safety seems more funding constrained for small projects and individuals than it is for organizations for the following reasons:
- The funds that fund smaller projects such as LTFF tend to have less money than other funds such as Open Phil which seems to be more focused on making larger grants to organizations (Open Phil spends 14x more per year on AI safety).
- Funding could be constrained by the throughput of grant-makers (the number of grants they can make per year). This seems to put funds like LTFF at a disadvantage since they tend to make a larger number of smaller grants so they are more constrained by throughput than the total amount of money available. Low throughput incentivizes making a small number of large grants which favors large existing organizations over smaller projects or individuals.
- Individuals or small projects tend to be less well-known than organizations so grants for them can be harder to evaluate or might be more likely to be rejected. On the other hand, smaller grants are less risky.
- The demand for funding for individuals or small projects seems like it could increase much faster than it could for organizations because new organizations take time to be created (though maybe organizations can be quickly scaled).Some possible solutions:
- Move more money to smaller funds that tend to make smaller grants. For example, LTFF could ask for more money from Open Phil.
- Hire more grant evaluators or hire full-time grant evaluators so that there is a higher ceiling on the total number of grants that can be made per year.
- Demonstrate that smaller projects or individuals can be as effective as organizations to increase trust.
- Seek more funding: half of LTFF’s funds come from direct donations so they could seek more direct donations.
- Existing organizations could hire more individuals rather than the individuals seeking funding themselves.
- Individuals (e.g. independent researchers) could form organizations to reduce the administrative load on grant-makers and increase their credibility.
For context, I have a very similar background to you—I’m a software engineer with a computer science degree interested in working on AI alignment.
LTFF granted about $10 million last year. Even if all that money were spent on independent AI alignment researchers, if each researcher costs $100k per year, then there would only be enough money to fund about 100 researchers in the world per year so I don’t see LTFF as a scalable solution.
Unlike software engineering, AI alignment research tends to be neglected and underfunded because it’s not an activity that can easily be made profitable. That’s one reason why there are far more software engineers than AI alignment researchers.
Work that is unprofitable but beneficial such as basic science research has traditionally been done by university researchers who, to the best of my knowledge, are mainly funded by government grants.
I have also considered becoming independently wealthy to work on AI alignment in the past but that strategy seems too slow if AGI will be created relatively soon.
So my plan is to apply for jobs at organizations like Redwood Research or apply for funding from LTFF and if those plans fail, I will consider getting a PhD and getting funding from the government instead which seems more scalable.
No offense but I sense status quo bias in this post.
If you replace “AI” with “industrial revolution” I don’t think the meaning of the text changes much and I expect most people would rather live today than in the Middle Ages.
One thing that might be concerning is that older generations (us in the future) might not have the ability to adapt to a drastically different world in the same way that some old people today struggle to use the internet.
I personally don’t expect to be overly nostalgic in the future because I’m not that impressed by the current state of the world: factory farming, the hedonic treadmill, physical and mental illness, wage slavery, aging, and ignorance are all problems that I hope are solved in the future.