Appendices to the live agendas

Lists cut from our main post, in a token gesture toward readability.

We list past reviews of alignment work, ideas which seem to be dead, the cool but neglected neuroscience /​ biology approach, various orgs which don’t seem to have any agenda, and a bunch of things which don’t fit elsewhere.

Appendix: Prior enumerations

Appendix: Graveyard

Appendix: Biology for AI alignment

Lots of agendas but not clear if anyone besides Byrnes and Thiergart are actively turning the crank. Seems like it would need a billion dollars.

Human enhancement

  • One-sentence summary: maybe we can give people new sensory modalities, or much higher bandwidth for conceptual information, or much better idea generation, or direct interface with DL systems, or direct interface with sensors, or transfer learning, and maybe this would help. The old superbaby dream goes here I suppose.

  • Theory of change: maybe this makes us better at alignment research

Merging

  • One-sentence summary: maybe we can form networked societies of DL systems and brains

  • Theory of change: maybe this lets us preserve some human values through bargaining or voting or weird politics.

  • Cyborgism, Millidge, Dupuis

As alignment aid

  • One-sentence summary: maybe we can get really high-quality alignment labels from brain data, maybe we can steer models by training humans to do activation engineering fast and intuitively, maybe we can crack the true human reward function /​ social instincts and maybe adapt some of them for AGI.

  • Theory of change: as you’d guess

  • Some names: Byrnes, Cvitkovic, Foresight’s BCI, Also (list from Byrnes): Eli Sennesh, Adam Safron, Seth Herd, Nathan Helm-Burger, Jon Garcia, Patrick Butlin

Appendix: Research support orgs

One slightly confusing class of org is described by the sample {CAIF, FLI}. Often run by active researchers with serious alignment experience, but usually not following an obvious agenda, delegating a basket of strategies to grantees, doing field-building stuff like NeurIPS workshops and summer schools.

CAIF

  • One-sentence summary: support researchers making differential progress in cooperative AI (eg precommitment mechanisms that can’t be used to make threats)

  • Some names: Lewis Hammond

  • Estimated # FTEs: 3

  • Some outputs in 2023: Neurips contest, summer school

  • Funded by: Polaris Ventures

  • Critiques:

  • Funded by: ?

  • Trustworthy command, closure, opsec, common good, alignment mindset: ?

  • Resources: £2,423,943

AISC

  • One-sentence summary: entrypoint for new researchers to test fit and meet collaborators. More recently focussed on a capabilities pause. Still going!

  • Some names: Remmelt Ellen, Linda Linsefors

  • Estimated # FTEs: 2

  • Some outputs in 2023: tag

  • Funded by: ?

  • Critiques: ?

  • Funded by: ?

  • Trustworthy command, closure, opsec, common good, alignment mindset: ?

  • Resources: ~$200,000

See also:

Appendix: Meta, mysteries, more