Priorities for the UK Foundation Models Taskforce

Link post

The UK government recently established the Foundation Models Taskforce, focused on AI safety and backed by £100M in funding. Founder, investor and AI expert Ian Hogarth leads the new organization.

The establishment of the Taskforce shows the UK’s intention to be a leading player in the greatest governance challenge of our times: keeping humanity in control of a future with increasingly powerful AIs. This is no small feat, and will require very ambitious policies that anticipate the rapid developments in the AI field, rather than just reacting to them.

Here are some recommendations on what the Taskforce should do. The recommendations fall into three categories: Communication and Education about AI risk, International Coordination, and Regulation and Monitoring.

Communication and Education about AI Risk

The Taskforce is uniquely positioned to educate and communicate about AI development and risks. Here is how it could do it:

Private education

The Taskforce should organize private education sessions for UK Members of Parliament, Lords, and high-ranking civil servants, in the form of presentations, workshops, and closed-door Q&As with Taskforce experts. These would help bridge the information gap between policymakers and the fast-moving AI field.

A new platform: ai.gov.uk

The Taskforce should take a proactive role in disseminating knowledge about AI progress, the state of the AI field, and the Taskforce’s own actions:

The Taskforce should publish bi-weekly or monthly Bulletins and Reports on AI on an official government website. The Taskforce can start doing this right away by publishing its bi-weekly or monthly bulletins and reports on the state of AI progress and AI risk on the UK government’s research and statistics portal.

The Taskforce should set up ai.gov.uk, an online platform modeled after the UK’s COVID-19 dashboard. The platform’s main page should be a dashboard showing key information about AI progress and Taskforce progress in achieving its goals, that gets updated regularly. ai.gov.uk should have a progress bar trending towards 100% for all of the Task Force’s key objectives.

ai.gov.uk should also include a “Safety Plans of AI Companies” monthly report, with key insights visualized on the dashboard.

The Taskforce should send an official questionnaire to each frontier AI company to compile this report. This questionnaire should contain questions about companies’ estimated risk of human extinction caused by the development of their AIs, their timelines until the existence of powerful and autonomous AI systems, and their safety plans regarding development and deployment of frontier AI models.

There is no need to make the questionnaire mandatory. For companies that don’t respond or respond only to some questions, the relevant information on the dashboard should be left blank, or filled in with a “best guess” or “most relevant public information” curated by Taskforce experts.

Public-facing communications

Taskforce members should utilize press conferences, official posts on the Taskforce’s website, and editorials in addition to ai.gov.uk to educate the public about AI development and risks. Key topics to cover in these public-facing communications include:

  1. Frontier AI development is focused on developing autonomous, superhuman, general agents, not just towards better chatbots or the automation of individual tasks. These are and will increasingly be AIs capable of making their own plans and taking action in the real world.

  2. No one fully understands how these systems function, their capabilities or limits, and how to control or restrict them. All of these remain unsolved technical challenges.

  3. Consensus on the societal-scale risk from AI is growing, and the government will respond seriously to this monumental challenge.

International Coordination

The UK plans to host a Global summit on AI and AI safety this year, and is positioning itself as a champion of greater international coordination on AI. The Taskforce can drive this effort and foster international coordination on addressing AI risk, positioning the UK as a global leader:

The Taskforce should support the FCDO in liaising with international partners and aid in setting up international institutions to deal with AI. The ambitious goal for this would be to spearhead a ‘CERN for AI’, along the lines of Ian Hogarth’s Island model, to remove research on powerful, autonomous AI systems away from private firms and into a highly-secure facility with multinational backing and supervision.

The Taskforce should support groups working on AI policy and safety globally by providing guidance and credibility. As the top authority on AI risk, the Taskforce should have the liberty to liaise with the public, external parties, and other governments, providing expert advice on AI risks. As such, the Taskforce can partner with smaller organizations, and help other countries establish their own AI risk-focused Taskforces. This will contribute to a coordinated global response to AI challenges.

Regulation and Monitoring

The Foundation Model Taskforce should commit to positions and targets that put it on a path to success in ensuring AI is transparent and safe, and communicate these positions to the public:

Model evaluations and their limitations

The Taskforce should establish a clear stance on the limitations and strengths of model evaluations as a policy tool. Crucially, it should make publicly clear that current evaluations and benchmarks are insufficient and, on their own, cannot serve as the basis for certifying a model as “safe”. This is because:

  1. Present evaluations only capture behavioural proxies, and a significant amount of work will be needed to develop better ones. Dangerous capabilities evaluations primarily consist of repeatedly prompting models in multiple ways to elicit capabilities and behaviours. This method gives evaluators only information about the capabilities they happen to discover through their choice of prompts, and about a model’s behaviour in one specific environment.

  2. Evaluations for capabilities, whether positive or negative, only indicate an absence of evidence, not evidence of absence. Evaluators have no way to know if they have discovered all capabilities of a given model through their repeated prompting, or how many they have missed. Just because a particular capability or behaviour isn’t observed in testing, it doesn’t mean that it isn’t present or couldn’t emerge under different conditions.

  3. Evaluations should be conducted across the AI model’s development lifecycle, not only on fully trained models. This can be done, for example, by running evaluations on the model’s training checkpoints, and only continuing the training run after the evaluations have been passed successfully.

  4. The Taskforce should also publicly commit to terminating training runs if a model fails certain evaluations, not just preventing future deployment. Evaluations without the ability to stop a training run from continuing are a policy measure with no teeth. If evaluations only result in deployment being prohibited, the developer will still have access to the fully trained model. This makes it very easy for the AI system to be leaked (intentionally or accidentally), used internally, deployed in a different jurisdiction, or exfiltrated by an adversary. The easiest way to reduce risk from a dangerous AI system is to stop it from being built.

Risk assessments, auditing, monitoring

Evaluations, by their nature, can only be conducted post-hoc, once an AI model has been (at least partially) developed. Given this limitation, they should be coupled with risk assessments, which allow to estimate the risks of proposed new AI projects ex-ante.

The Taskforce should be able to audit companies. This would involve reviewing their development procedures, R&D practices, and plans for upcoming training runs, ensuring that they align with safety standards and best practices in AI development.

The Taskforce should also develop the capability to monitor compute allocation and ongoing training runs. This information can be requested from both cloud computing providers, as well as from AI developers. The Taskforce’s monitoring entity should also have the statutory ability to request a halt on training runs, and to conduct inspections or audits on the company conducting them.

Companies wishing to operate in the UK should be required to pre-register their runs above a certain size, to streamline the monitoring process. This approach ensures that all operations are under the government’s radar from the outset, promoting transparency and adherence to regulations.

Oversight and prohibition thresholds

An important part of the Taskforce’s role in regulation and monitoring involves establishing oversight and prohibition thresholds on training run size.

Two thresholds, calculated in terms of total computation used for an AI model’s training run (measured in FLOP), should be established: a lower “Frontier Threshold”, and a higher “Horizon Threshold”. These thresholds will help classify AI training runs and apply adequate measures to each:

  1. Training runs that meet or exceed the “Frontier Threshold” must adhere to mandatory oversight requirements from the Taskforce, including closer monitoring, stringent reporting, and regular audits.

  2. Training runs that meet or exceed the “Horizon Threshold” should be prohibited by default.

Here are more details on how such thresholds could work.

This prohibition might be relaxed or lifted in the future, given sufficient evidence and credible plans of how to control and predict AI system’s capabilities. Conversely, both thresholds could be lowered in the future to account for algorithmic progress.

It’s important to note that risks and concerns related to models below the Frontier Threshold will not fall under the remit of the Foundation Models Taskforce. Instead, these issues will remain under the purview of other regulatory bodies and institutions. This will ensure that the oversight requirements put in place by the Foundation Models Taskforce only apply to the frontier of AI systems, and will not affect the vast majority of AI research and development.

Additional resources

Thanks to Connor Axiotes, Jason Hausenloy, Eva Behrens, and everyone else who provided feedback!