Yup, there is a working prototype and a programmer who would like to work on it full time if there was funding, but it’s not been progressing much for the past year or so because no one has had the free bandwidth to work on it.
https://aisafety.world/tiles/ has a bunch.
That’s fair, I’ve added a note to the bottom of the post to clarify my intended meaning. I am not arguing for it in a well-backed up way, just stating the output of my models from being fairly close to the situation and having watched a different successful mediation.
Forced or badly done mediation seems indeed terrible, entering into conversation facilitated by someone skilled with an intent to genuinely understand the harms caused and make sure you correct he underlying patterns seems much less bad than the actual way the situation played out.
I was asked to comment by Ben earlier, but have been juggling more directly impactful projects and retreats. I have been somewhat close to parts of the unfolding situation, including spending some time with both Alice, Chloe, and (separately) the Nonlinear team in-person, and communicating online on-and-off with most parties.
I can confirm some of the patterns Alice complained about, specifically not reliably remembering or following through on financial and roles agreements, and Emerson being difficult to talk to about some things. I do not feel notably harmed by these, and was able to work them out with Drew and Kat without much difficulty, but it does back up my perception that there were real grievances which would have been harmful to someone in a less stable position. I also think they’ve done some excellent work, and would like to see that continue, ideally with clear and well-known steps to mitigate the kinds of harms which set this in motion.
I have consistently attempted to shift Nonlinear away from what appears to me a wholly counterproductive adversarial emotional stance, with limited results. I understand that they feel defected against, especially Emerson, but they were in the position of power and failed to make sure those they were working with did not come out harmed, and the responses to the initial implosion continued to generate harm and distraction for the community. I am unsettled by the threat of legal action towards Lightcone and focus on controlling the narrative rather than repairing damage.
Emerson: You once said one of the main large failure modes you were concerned about becoming was Stalin’s mistake: breaking the networks of information around you so you were unaware things were going so badly wrong. My read is you’ve been doing this in a way which is a bit more subtle than the gulags, by the intensity of your personality shaping the fragments of mind around you to not give you evidence that in fact you made some large mistakes here. I felt the effects of this indirectly, as well as directly. I hope you can halt, melt, and catch fire, and return to the effort as someone who does not make this magnitude of unforced error.
You can’t just push someone who is deeply good out of the movement which has the kind of co-protective nature of ours in the way you merely shouldn’t in some parts of the world, if there’s intense conflict call in a mediator and try and heal the damage.
Edit: To clarify, this is not intended as a blanket endorsement of mediation, or of avoiding other forms of handling conflict. I do think that going into a process where the parties genuinely try and understand each other’s worlds much earlier would have been much less costly for everyone involved as well as the wider community in this case, but I can imagine mediation is often mishandled or forced in ways which are also counterproductive.
How can I better recruit attention and resources to this topic?
Consider finding an event organizer/ops person and running regular retreats on the topic. This will give you exposure to people in a semi-informal setting, and help you find a few people with clear thinking who you might want to form a research group with, and can help structure future retreats.
I’ve had great success with a similar approach.
We’re getting about 20k uniques/month across the different URLs, expect that to get much higher once we make a push for attention when Rob Miles passes us for quality to launch to LW then in videos.
AI safety is funding constrained, we win more timelines if there are a bunch of people investing to give successfully.
If you’re able to spend time in the UK, the EA Hotel offers free food and accommodation for up to two years in low-cost shared living. Relatedly, there should really be one of these in the US and mainland Europe.
feel that I have a bad map of the AI Alignment/Safety community
This is true of many people, and why I built the map of AI safety :)
Next step is to rebuild aisafety.com into a homepage which ties all of this together, and offer AI Safety Info’s database via an API for other websites (like aisafety.com, and hopefully lesswrong) to embed.
Why did the Alignment community not prepare tools and plans for convincing the wider infosphere about AI safety years in advance?
I’ve been organizing the volunteer team who built AI Safety Info for the past two and a half years, alongside building a whole raft of other tools like AI Safety Training and AI Safety World.
But, yes, the movement as a whole has dropped the ball pretty hard on basic prep. The real answer is that things are not done by default, and this subculture has relatively few do-ers compared to thinkers. And the thinkers had very little faith in the wider info-sphere, sometime actively discouraging most do-ers from trying broad outreach.
Early corporations, like the East India Company, might be a decent reference class?
I’m pretty sure that at some level what sorts of things your brain spits out into your consciousness and how useful that information is in the given situation, is something that you can’t fundamentally change. I expect this to be a hard-coded algorithm
Tune Your Cognitive Strategies proports to offer a technique which can improve that class of algorithm significantly.
Edit: Oh, no, you were meaning a different thing, and this probably goes into the inputs to the algorithm category?
Your probabilities are not independent, your estimates mostly flow from a world model which seem to me to be flatly and clearly wrong.
The plainest examples seem to be assigning
despite current models learning vastly faster than humans (training time of LLMs is not a human lifetime, and covers vastly more data) and the current nearing AGI and inference being dramatically cheaper and plummeting with algorithmic improvements. There is a general factor of progress, where progress leads to more progress, which you seem to be missing in the positive factors. For the negative, derailment that delays enough to push us out that far needs to be extreme, on the order of a full-out nuclear exchange, given more reasonable models of progress.
I’ll leave you with Yud’s preemptive reply:
Taking a bunch of number and multiplying them together causes errors to stack, especially when those errors are correlated.
Nice! Glad to see more funding options entering the space, and excited to see the S-process rolled out to more grantmakers.Added you to the map of AI existential safety:
Cool! Feel free to add it with the form
that was me for context:
core claim seems reasonable and worth testing, though I’m not very hopeful that it will reliably scale through the sharp left turn
my guesses the intuitions don’t hold in the new domain, and radical superintelligence requires intuitions that you can’t develop on relatively weak systems, but it’s a source of data for our intuition models which might help with other stuff so seems reasonable to attempt.
Meta’s previous LLM, OPT-175B, seemed good by benchmarks but was widely agreed to be much, much worse than GPT-3 (not even necessarily better than GPT-Neo-20b). It’s an informed guess, not a random dunk, and does leave open the possibility that they’re turned it around and have a great model this time rather than something which goodharts the benchmarks.
This is a Heuristic That Almost Always Works, and it’s the one most likely to cut off our chances of solving alignment. Almost all clever schemes are doomed, but if we as a community let that meme stop us from assessing the object level question of how (and whether!) each clever scheme is doomed then we are guaranteed not to find one.
Security mindset means look for flaws, not assume all plans are so doomed you don’t need to look.
If this is, in fact, a utility function which if followed would lead to a good future, that is concrete progress and lays out a new set of true names as a win condition. Not a solution, we can’t train AIs with arbitrary goals, but it’s progress in the same way that quantilizers was progress on mild optimization.