Speed running everyone through the bad alignment bingo. $5k bounty for a LW conversational agent

There’s a wave of people, of various degrees of knowledge and influence, currently waking up to the ideas of AI existential risk. They seem to be literally going through every box of the bad alignement bingo card takes.

I think there is value in educating those people. I’m aware there’s an argument to be made that: education at scale doesn’t matter, coordination is too difficult, all that matter is solving alignment and that takes care of the rest.

There’s something to that, but I disagree that education at scale doesn’t help. It can make progress of frontrunners marginally more safety oriented, it can steer company cultures, it can move the Overton window, change the Zeitgeist, it can buy a bit of time. You likely didn’t stumble on these ideas all on your own, so arguing against the value of outreach or education is also arguing against your own ability to do anything.

It’s also a matter of ROI, and there are some very low hanging fruit there. The simplest thing would be to write a long FAQ that goes through every common objections. No, people won’t read the whole sequences, or Arbital on their own, but they might go through a FAQ.

But we can do better than a FAQ. It’s now fairly straightforward, with tools like langchain (https://​​github.com/​​hwchase17/​​langchain) to turn a set of documents into a body of knowledge for a conversational agent. This is done by building an index of embedding that a language model can search to bring context to an answer. This doesn’t preclude fine tuning, but it makes it unnecessary.

So a straightforward project is to index lesswrong, index arbitral, index the alignment forum, maybe index good alignement papers as well, blog posts, books.

Then hook that up to the ChatGPT API, and prompt it to:

  1. list search queries for relevant material to answer the question

  2. compose an answer that reflects the content and opinion of the data

  3. answer with infinite patience

Some jailbreak prompts may be needed to prevent ChatGPT’s conditioning to regurgitate AI risk appeasing propaganda through the API, but there are a bunch of those out there. Or use the API of other models as they become open source or commercially available.

Will this save humanity? No. Will this turn the course of safety research? Also no. Is this using AI to advance alignment? Well, yes, a little bit, don’t dismiss very small starts.

Is this worth spending a weekend hacking on this project instead of posting on Twitter? Absolutely.

Will this actually make things worse? No, you’re overthinking this.

I’ll pay $5k to the best version built by the end of March (if any is built). It’s a modest bounty but it’s really not all that much work, and it’s fun work. And of course if anyone wants to add their own contribution to the bounty please do.