AI-Plans.com—a contributable compendium

Link post

Hello, we’re working on https://​​ai-plans.com .

Ideas behind the site:

Right now, alignment plans are spread all over the place and it’s difficult for a layperson, or someone unfamiliar with the field to get an idea of what is the current plan for making AGI or even the models we have right now, safe. And the problems with said plans.

Having a place where all AI Alignment plans and criticism of said plans can be added and seen in an easy to read way is helpful. It offers possibilities such as; seeing the most common problems with plans, which kinds of plans have the least problems, pushing for regulations against the most poor plans etc.

Judging the quality of a plan is hard and has a lot of ways to go wrong. Judging the quality of a criticism might be less complicated and perhaps has less ways it can go wrong.

Which is why I believe this site is useful

Hasn’t this already been done? aisafety.info, aiideas.com, etc

aisafety.info is excellent, and the folks there are building a conversational agent for AI Safety, which could be incredibly useful. However it’s not very simple to use to learn about specific alignment plans yet and is more of a general purpose place to learn about ai, ai-risk, ai-safety, etc.

The purpose of ai-plans.com is to be an easy to read platform that sorts the good plans from the bad and shows the problems with each one.

I believe there was a site called ai-ideas.com or something mentioned to me? But that wasn’t working the last time I checked and there’s been no change, as far as I know.

Aims

Stage 1)

A contributable compendium with most, if not all, of the plans for alignment, and criticisms


Estimated time left for this to be done: 1 week—adding ~5 plans a day now


What’s left to do:

  • Functionality for responding to criticisms-

The idea for this is that a plan’s author can select a criticism/​criticisms that they think they have a solution to, then submit a new version of the plan, with the criticism(s) quoted in the new plan.
Any criticisms they didn’t select, will be automatically added to the new version of the plan (the idea being that any unselected criticisms are ones that they don’t have solutions to, so should still apply).

  • Adding more plans and criticisms

  • A filter for spam and duplicates

  • Creating a template/​guide on how to post plans

  • Improve the UX—font, colours, design, etc.

What we’re missing for this stage:

  • A lawyer to make sure we’re complying with GDPR and help make a cookies notice

  • Moderators to help filter spam and make sure plans are submitted correctly

What would be helpful, but not essential for this stage:

  • More copywriters to speed up the process of adding alignment plans

  • More red-teamers/​quality testers

Stage 2)

In addition to everything from Stage 1, there is now a scoring system for criticisms and a ranking system for plans- where plans are ranked from top to bottom based on the total scores of their criticisms.


The scoring system is a very essential part of the site.

It’s aim is to give the most points to the most accurate criticisms and use that to rank plans from the least total criticism points (at the top) to the most criticism points(at the bottom).

Users will be able to upvote or downvote criticisms.

Users will also have a ‘karma’ that will affect how weighted their votes on criticisms are- somewhat similar to the LessWrong and AlignmentForum system- though, we’re considering having karma past a certain ‘age’ become spendable, rather than add to the weight of the users vote, to avoid to first vote problem of LessWrong.

Users who accumulate more points on their criticisms will have a higher karma.

Plans will have a ‘bounty’ inversely proportional to the total number of criticism points they have.

I think it’s important to get this right the first time, so that there isn’t a toxic/​unhelpful culture built on the site- or ruins the site’s reputation.

To prevent circular point gains (e.g. a group of users upvoting each other with disregard for how good the plans are, or being biased towards each other) - look at correlation of interactions (net vote) between users.

A perfect correlation or a correlation approaching perfect could be cause for action, out of suspicion of dishonest activity.

Alternatively, the correlation of interactions users have with each other could be a weight and if say, users X and users Y have a weight of correlated interactions with each other that passes a threshold, it could add a negative towards the weight of the users X and Y towards each other. - this could be really complicated to code though

Users will not be able to vote on their own criticisms. There will be no option to directly vote on plans- plans will be scored

Estimated time left for this to be done: 15 days (could be a week if things go smoothly).

What’s left to do:

  • The scoring/​karma/​bounty system

  • Testing/​red teaming the system to see how it can be misused/​broken/​perverted

  • Working on solutions to any vulnerabilities found (e.g. circular voting)

What we’re missing for this stage:

  • Depending on how heavy user traffic gets, we may need things such as server maintenance, setting up stuff in the cloud (I might be able to do a lot of that, since my background is DevOps)

  • Will likely need to change hosts at some point

What would be helpful, but not essential for this stage:

  • A mathematician to help find a more efficient way to make the scoring system

  • A dedicated red teamer/​QA to find ways to subvert/​pervert/​break/​misuse the points system- currently I’m doing a lot of this

  • Funding to help me be able to pay Connor -the fantastic developer of the site, help get more people on board and help me spend more time on this.

Stage 3)

At this stage, we add cash prizes for:

  • The user with the most karma that month.

  • The user with the highest karma raised that month.

  • The author/​authors of the plan with the highest bounty

Estimated time left for this to be done: 2 months

What’s left to do:

  • Setting up a secure way to store the prize funds

  • Fundraising for the prizes

  • Setting up a secure way to distribute the prizes

What we’re missing for this stage:

  • A lawyer to help set up the terms and conditions for the prizes and make sure we’re not making any false promises

  • Developers/​DevOps to help with the effects of a high user count, which we will likely have at this stage

  • The funds for the prizes

  • Funds to pay the team we will likely need at this stage

  • Red teamers/​QA/​Pen testers to find vulnerabilities

  • More developers to maintain the site and help fix problems that come up

What would be helpful, but not essential for this stage:

  • A social media manager


Please let me know if you’re interested in joining/​contributing and please feel free to contribute directly and add plans and/​or criticisms!

I’m also looking to hear any way that the site could be improved that’s not been mentioned here and also any reasons it’s a bad idea or ways that it could go wrong.

Thank you!