Research Engineering Intern at the Center for AI Safety. Helping to write the AI Safety Newsletter. Studying CS and Economics at the University of Southern California, and running an AI safety club there. Previously worked at AI Impacts and with Lionel Levine and Collin Burns on calibration for Detecting Latent Knowledge Without Supervision.
aogara
Analysis: US restricts GPU sales to China
Argument against 20% GDP growth from AI within 10 years [Linkpost]
Key Papers in Language Model Safety
AISN #25: White House Executive Order on AI, UK AI Safety Summit, and Progress on Voluntary Evaluations of AI Risks
Thank you, this was very helpful. As a bright-eyed youngster, it’s hard to make sense of the bitterness and pessimism I often see in the field. I’ve read the old debates, but I didn’t participate in them, and that probably makes them easier to dismiss. Object level arguments like these help me understand your point of view.
AISN #28: Center for AI Safety 2023 Year in Review
Adversarial Robustness Could Help Prevent Catastrophic Misuse
This is a great presentation of the compute-focused argument for short AI timelines usually given by the BioAnchors report. Comparing several ML systems to several biological brain sizes provides more data points that BioAnchors’ focus on only the human brain vs. TAI. You succinctly summarize the key arguments against your viewpoint: that compute growth could slow, that human brain algorithms are more efficient, that we’ll build narrow AI, and the outside view economics perspective. While your ultimate conclusion on timelines isn’t directly implied by your model, that seems like a feature rather than a bug — BioAnchors offers false numerical precision given its fundamental assumptions.
Full Automation is Unlikely and Unnecessary for Explosive Growth
All of these academics are widely read and cited. Looking at their Google Scholar profiles, everyone one of them has more than 1000, and half have more than 10,000 citations. Outside of LessWrong, lots of people in academia and industry labs already read and understand their work. We shouldn’t disparage people who are successfully bringing AI safety into the mainstream ML community.
My report estimates that the amount of training data required to train a model with N parameters scales as N^0.8, based significantly on results from Kaplan et al 2020. In 2022, the Chinchilla scaling result (Hoffmann et al 2022) showed that instead the amount of data should scale as N.
Are you concerned that pretrained language models might hit data constraints before TAI? Nostalgebraist estimates that there are roughly 3.2T tokens available publicly for language model pretraining. This estimate misses important potential data sources such as transcripts from audio and video and private text conversations and email. But the BioAnchors report estimates that the median transformative model will require a median of 22T data points, nearly an order of magnitude higher than this estimate.
The BioAnchors estimate was also based on older scaling laws that placed a lower priority on data relative to compute. With the new Chinchilla scaling laws, more data would be required for compute-optimal training. Of course, training runs don’t need to be compute-optimal: You can get away with using more compute and less data if you’re constrained by data, even if it’s going to cost more. And text isn’t the only data a transformative model could use: audio, video, and RLHF on diverse tasks all seem like good candidates.
Does the limited available public text data affect your views of how likely GPT-N is to be transformative? Are there any considerations overlooked here, or questions that could use a more thorough analysis? Curious about anybody else’s opinions, and thanks for sharing the update, I think it’s quite persuasive.
- 3 Aug 2022 3:31 UTC; 5 points) 's comment on chinchilla’s wild implications by (
Emergent Abilities of Large Language Models [Linkpost]
AISN #30: Investments in Compute and Military AI Plus, Japan and Singapore’s National AI Safety Institutes
Unfortunately I don’t think academia will handle this by default. The current field of machine unlearning focuses on a narrow threat model where the goal is to eliminate the impact of individual training datapoints on the trained model. Here’s the NeurIPS 2023 Machine Unlearning Challenge task:
The challenge centers on the scenario in which an age predictor is built from face image data and, after training, a certain number of images must be forgotten to protect the privacy or rights of the individuals concerned.
But if hazardous knowledge can be pinpointed to individual training datapoints, perhaps you could simply remove those points from the dataset before training. The more difficult threat model involves removing hazardous knowledge that can be synthesized from many datapoints which are individually innocuous. For example, a model might learn to conduct cyberattacks or advise in the acquisition of biological weapons after being trained on textbooks about computer science and biology. It’s unclear the extent to which this kind of hazardous knowledge can be removed without harming standard capabilities, but most of the current field of machine unlearning is not even working on this more ambitious problem.
Wild. One important note is that the model is trained with labeled examples of successful performance on the target task, rather than learning the tasks from scratch by trial and error like MuZero and OpenAI Five. For example, here’s the training description for the DeepMind Lab tasks:
We collect data for 255 tasks from the DeepMind Lab, 254 of which are used during training, the left out task was used for out of distribution evaluation. Data is collected using an IMPALA (Espeholt et al., 2018) agent that has been trained jointly on a set of 18 procedurally generated training tasks. Data is collected by executing this agent on each of our 255 tasks, without further training.
Gato then achieves near-expert performance on >200 DM Lab tasks (see Figure 5). It’s unclear whether the model could have learned superhuman performance training from scratch, and similarly unclear whether the model could learn new tasks without examples of expert performance.
More broadly, this seems like substantial progress on both multimodal transformers and transformer-powered agents, two techniques that seem like they could contribute to rapid AI progress and risk. I don’t want to downplay the significance of these kinds of models and would be curious to hear other perspectives.
AISN #19: US-China Competition on AI Chips, Measuring Language Agent Developments, Economic Analysis of Language Model Propaganda, and White House AI Cyber Challenge
Model-driven feedback could amplify alignment failures
Git Re-Basin: Merging Models modulo Permutation Symmetries [Linkpost]
Kevin Esvelt explicitly calls for not releasing future model weights.
Would sharing future model weights give everyone an amoral biotech-expert tutor? Yes.
Therefore, let’s not.
To test this claim we could look to China, where AI x-risk concerns are less popular and influential. China passed a regulation on deepfakes in January 2022 and one on recommendation algorithms in March 2022. This year, they passed a regulation on generative AI which requires evaluation of training data and red teaming of model outputs. Perhaps this final measure was the result of listening to ARC and other AI safety folks who popularized model evaluations, but more likely, it seems that red teaming and evaluations are the common sense way for a government to prevent AI misbehavior.
The European Union’s AI Act was also created before any widespread recognition of AI x-risks.
On the other hand, I agree that key provisions in Biden’s executive order appear acutely influenced by AI x-risk concerns. I think it’s likely that without influence from people concerned about x-risk, their actions would more closely resemble the Blueprint for an AI Bill of Rights.
The lesson I draw is that there is plenty of appetite for AI regulation independent of x-risk concerns. But it’s important to make sure that regulation is effective, rather than blunt and untargeted.
Link to China’s red teaming standard — note that their definitions of misbehavior are quite different from yours, and they do not focus on catastrophic risks: https://twitter.com/mattsheehan88/status/1714001598383317459?s=46