I’m interested in talking with anyone who is looking at the EU EA Hotel idea mentioned in the post. Also I’m working with Rob Miles’s community on a project to improve the pipeline, an interactive FAQ system called Stampy.
The goals of the project are to:Offer a one-stop-shop for high-quality answers to common questions about AI alignment.Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it’s used.Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.Track reactions on messages so we can learn which answers need work.Identify missing external content to create.
The goals of the project are to:
Offer a one-stop-shop for high-quality answers to common questions about AI alignment.
Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.
Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it’s used.
Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.
Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.
Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.
Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.
Track reactions on messages so we can learn which answers need work.
Identify missing external content to create.
We’re still working on it, but would welcome feedback on how the site is to use and early adopters who want to help write and answer questions. You can join the public Discord or message me for an invite to the semi-private patron one.
Oh, my bad, it was a 7 day invite by Discord default, made it everlasting now.
Cool, booked a call for later today.
These are reasonable points, but I am curious about whether you would accept a high-quality run of shorter (but still considerable) length for a payout of <steps>/1000 of $20,000, and approximately the lower bound of run length which seems likely to be valuable? Producing 600 pages of text is an extremely big commitment for uncertain gains, especially with the potential to run out of early slots and no guarantee that it will be included in the 100 later, giving people the option to do even modestly smaller chunks may mean much greater uptake and more high quality work to chose from.
I think the MVP way to do this would be a Discord server with non-public channels for individual runs and using the threads feature to give feedback to each other. If anyone would like to do that and is looking for collaborators, drop by the Visible Thoughts Discord and let us know.
Strong upvote. The argument from training diversity seems plausible, but the key point is that when trying to point large amounts of effort at writing content having it be delivered in smaller chunks than a novel would allow many more people to risk putting in time and learn whether they can contribute, and ultimately raise quality and volume substantially. It will also make it much easier to build a collaborative project around this, as people could submit their work for community review without a review taking an extremely long time and large amount of effort.
I’d also propose that the bounty be updated to allow smaller submissions relatively soon for higher visibility. MIRI could easily allow backward compatibility fairly easily by just accepting smaller submissions, without needing to reject longer ones.
If the concern is the hassle of handing out lots of smaller bounties, MIRI could accept batches of small runs and let some trusted middle-man handle the details of the distribution.
I’ve got a slightly terrifying hail mary “solve alignment with this one weird trick”-style paradigm I’ve been mulling over for the past few years which seems like it has the potential to solve corrigibility and a few other major problems (notably value loading without Goodharting, using an alternative to CEV which seems drastically easier to specify). There are a handful of challenging things needed to make it work, but they look to me maybe more achievable than other proposals which seem like they could scale to superintelligence I’ve read.
Realistically I am not going to publish it anytime soon given my track record, but I’d be happy to have a call with anyone who’d like to poke my models and try and turn it into something. I’ve had mildly positive responses from explaining it to Stuart Armstrong and Rob Miles, and everyone else I’ve talked to about it at least thought it was creative and interesting.
I also like the idea of collaboration and figuring out a way to share gains from the bounty in a way which people helping each other out, and have set up a Discord for real time collaboration. I’m also committing to not making any profit from this, though I am open to building systems which allow organizers other than me to be compensated.
I’m setting up a place for writers and organizers to find each other, collaborate, and discuss this; please join the Discord. More details in this comment.
I’ve set up a Discord server for discussing collaborations and thinking about mechanism design for sharing out credit (current top idea is borrowing Rob Miles’s Discord eigenkarma system with modifications, but liable to change), please join if you’re considering becoming a run author (no commitment to being part of this effort).
I don’t need the money and won’t be skimming off any funds for my contributions to the project, but am very open to people turning up with a bunch of great ideas and making everything work smoother and taking a management fee as compensation, so please also join if you’re interested in becoming a project leader or organizational assistant.
The simple and dumb system referred to is humans, relative to a superintelligence, as I understood it.
I’d suggest talking to AI Safety Support, they offer free calls with people who want to work in the field. Rohin’s advice for alignment researchers is also worth looking at, it talks a fair amount about PhDs.
For that specific topic, maybe https://www.lesswrong.com/posts/LpM3EAakwYdS6aRKf/what-multipolar-failure-looks-like-and-robust-agent-agnostic is relevant?
Open to a better name for this. The reason I went with this (rather than Alignment Proposals, Success Stories, or just Success Models) is because I liked capturing this as the mirror of threat models, and including AI feels like a natural category since the other x-risks don’t have clear win conditions unlike threat models which apply widely. I also would like to include this in the AI box in the portal since it feels like a super important tag, and including AI makes that more likely.
Mixture of Experts, pretty sure.
It is possible, you just paste the image apparently, thanks Yoav Ravid for the tip.
Yep, that was me adding some new ones without the parameter (though I think I didn’t remove it from any which already had it), did not know that was needed, fixed now (and fixed on portal page).
Is it not possible to use images in tags? Or am I just using the wrong syntax?
I think this should be under “Other” in the AI category. Is it possible for regular users to categorize tags?
I think this should be under AI, possibly Engineering, but not certain of the subcategory.
I think this should be in the AI category, likely under Alignment Theory.