Outsiders should focus on specs/​constitutions (among other things)

I think that the external AI safety community should prioritise model specs/​constitutions over the next 12 months. It shouldn’t be our top priority,[1] but it’s pretty important[2] and neglected. In this post, I will argue that it’s tractable, even if you aren’t a lab employee:

  1. It’s a natural language document. So you don’t need to know any ML or engineering.

  2. You don’t need to know about the internal codebase of the lab, or other proprietary details about how they train the models. All you need to know is the current spec/​constitution — but that is public and will probably remain public.

  3. Insiders might enjoy more R&D uplift than outsiders (i.e. they have access to unreleased models, have higher rate limits, and don’t need to pay API costs). So the outsiders should focus on work for which there is less uplift, e.g. macrostrategy /​ conceptual reasoning /​ threat modelling. And spec/​constitution involves exactly these kinds of tasks.

  4. It’s very easy to integrate suggestions from the outsiders into the spec/​constitution. It’s copy-pasting a short text string into a markdown file.

    1. This is in contrast to integrating a new safety technique — which involves transferring from the open-source infrastructure to the closed-source one.

    2. It’s pretty costly to train a new model on the new spec/​constitution — but this obstacle applies equally strongly to amendments proposed by the insiders.

  5. The spec/​constitution describes how the model should behave in a wide range of different domains, avoiding a wide range of different threats. So if you have expertise in any domain or threat models then you can probably contribute.

  6. There’s a precedent for the outsiders contributing to the spec/​constitution, e.g. here’s the acknowledgements for the Claude constitution:
    External commenters who gave detailed feedback or discussion on the document include: Jim Baker, Owen Cotton-Barratt, Mariano-Florentino Cuéllar, Justin Curl, Tom Davidson, Lukas Finnveden, Brian Green, Ryan Greenblatt, janus, Joshua Joseph, Daniel Kokotajlo, Will MacAskill, Father Brendan McGuire, Antra Tessera, Bishop Paul Tighe, Jordi Weinstock, and Jonathan Zittrain.

  7. You can influence the spec/​constitution by: writing a draft passage, adding an explanation of why this amendment would help, sharing it with more senior AI safety people for feedback, and then sending it to a lab insider working on spec/​constitution.

  8. Some aspects of spec/​constitution seems inherently ill-suited for lab employees, e.g. power concentration.

Recommendations:

  1. It might be hard to persuade the labs that their overall judgement is incorrect (e.g. the tradeoff between alignment and corrigibility). Instead, you should focus on topics the lab hasn’t considered or formed a judgment about.

  2. You should write draft passages. But to avoid constitutional poisoning, don’t write “Claude” in the drafts. You could use a different French name, see here for an example.

  3. Think about threat models that aren’t addressed in the current constitution. Then think how to inoculate against those.

  1. ^

    Some other priority tiers include:

    P0: Evaluate near-term risks; Communicate with lab leadership & policymakers
    P1: Forecasting and threat modelling; Stress-testing safety cases (e.g. model organisms)
    P2: Capacity building; Communicate to public; Developing safety techniques; Basic science (deep learning; LLM generalisation).
    P3: Secure future funding; Secure future model access; Elicit capabilities on target domains (e.g. macrostrategy, alignment research)

    I think specs/​constitutions should be a P2 or P3.

  2. ^

    See AI character is a big deal by Will MacAskill and Tom Davidson.