plex
I was initially confused and not sure why you were adding lots of extra parts to the analogy which complicate the process of analogizing by overloading the human prompt window, but then I realized that the analogy was specifically trying to get across the idea of there being loads of specific engineering things to work out as well as general principles (which seems true and useful, but should be captured early on in a tl;dr with the blog post mostly for people who want a deep dive into rocketry history).
I think this was a fair attempt, but missed the costs to easy understanding of loading up large amounts of context (names and dates and numbers and loads of specifics) without making it clear what these were for.
If only I had an enemy bigger than my apathy I could have won
I’m glad to have found an enemy considerably bigger than my apathy. I don’t expect to win in most branches, but in some we survive and I hope to make my future selves proud of my efforts here in the present.
And damn if it isn’t an interesting challenge, surrounded by so many great people working for the good of everyone.
Yep, I noticed this and edited in:
Edit: Most (but not all! despite apparently identical settings!) of the redirects are not currently functioning. Will look into and fix.
Edit: Fixed, thanks to the helpful people at alignment.dev, a place for programmers who want to help with alignment related projects.
Fun trivia: Arbital was internally called project zanadu pre-naming, as an attempt to keep awareness of these failure modes in mind.
Anti-squatted AI x-risk domains index
Stampy has some of this, over at What are some good resources on AI alignment?
We’re working on a How can I help tree of questions and answers, which will include more info on who to talk to, but for now I’ll suggest AI Safety Support and 80k.
High G factor seems the key thing we’re attempting to select for, along with other generally helpful traits like conscientiousness, altruism, personability, and relevant domain-specific skills.
Rationality catalyzing tools could be very beneficial if successful. If your internal yum-meter points towards that being the most engaging thing for you, it seems a reasonable path (and will be a good way to grow even if the moonshot part does not land).
Being a highly infectious memeplex actively trades off against high average competence. We don’t need a million footsoldiers who chant the right tribal words without deep understanding, we need the best and brightest in rooms where there’s no one to bring the dynamic down.
There are efforts to evangalise to specific target groups, like ML researchers or people at the International Math Olympiads. These are encouraged, though they could be scaled better.
This is a great idea! As an MVP we could well make a link to a recommended Stampy path (this is an available feature on Stampy already, you can copy the URL at any point to send people to your exact position), once we have content. I’d imagine the most high demand ones would be:
What are the basics of AI safety?
I’m not convinced, is this actually a thing?
How do I help?
What is the field and ecosystem?
Do you have any other suggestions?
And having a website which lists these paths, then enriches them, would be awesome. Stampy’s content is available via a public facing API, and one other team is already interested in using us as a backend. I’d be keen for future projects to also use Stampy’s wiki as a backend for anything which can be framed as a question/answer pair, to increase content reusability and save on duplication of effort, but more frontends could be great!
There’s a related Stampy answer, based on Critch’s post. It requires them to be willing to watch a video, but seems likely to be effective.
A commonly heard argument goes: yes, a superintelligent AI might be far smarter than Einstein, but it’s still just one program, sitting in a supercomputer somewhere. That could be bad if an enemy government controls it and asks it to help invent superweapons – but then the problem is the enemy government, not the AI per se. Is there any reason to be afraid of the AI itself? Suppose the AI did appear to be hostile, suppose it even wanted to take over the world: why should we think it has any chance of doing so?
There are numerous carefully thought-out AGI-related scenarios which could result in the accidental extinction of humanity. But rather than focussing on any of these individually, it might be more helpful to think in general terms.
“Transistors can fire about 10 million times faster than human brain cells, so it’s possible we’ll eventually have digital minds operating 10 million times faster than us, meaning from a decision-making perspective we’d look to them like stationary objects, like plants or rocks… To give you a sense, here’s what humans look like when slowed down by only around 100x.”
Watch that, and now try to imagine advanced AI technology running for a single year around the world, making decisions and taking actions 10 million times faster than we can. That year for us becomes 10 million subjective years for the AI, in which ”...there are these nearly-stationary plant-like or rock-like “human” objects around that could easily be taken apart for, say, biofuel or carbon atoms, if you could just get started building a human-disassembler. Visualizing things this way, you can start to see all the ways that a digital civilization can develop very quickly into a situation where there are no humans left alive, just as human civilization doesn’t show much regard for plants or wildlife or insects.”And even putting aside these issues of speed and subjective time, the difference in (intelligence-based) power-to-manipulate-the-world between a self-improving superintelligent AGI and humanity could be far more extreme than the difference in such power between humanity and insects.
“AI Could Defeat All Of Us Combined” is a more in-depth argument by the CEO of Open Philanthropy.
That’s the static version, see Stampy for a live one which might have been improved since this post.
Agreed that for a post-intelligent explosion AI alignment is effectively binary. I do agree with the sharp left turn etc positions, and don’t expect patches and cobbled together solutions to hold up to the stratosphere.
Weakly aligned—Guided towards the kinds of things we want in ways which don’t have strong guarantees. A central example is InstructGPT, but this also includes most interpretability (unless dramatically more effective than current generation), and what I understand to be Paul’s main approaches.
Weakly superintelligent—Superintelligent in some domains, but has not yet undergone recursive self improvement.
These are probably non-standard terms, I’m very happy to be pointed at existing literature with different ones which I can adopt.
I am confident Eliezer would roll his eyes, I have read a great deal of his work and recent debates. I respectfully disagree with his claim that you can’t get useful cognitive work on alignment out of systems which have not yet FOOMed and taken a sharp left turn, based on my understanding of intelligence as babble and prune. I don’t expect us to get enough cognitive work out of these systems in time, but it seems like a path which has non-zero hope.
It is plausible that AIs unavoidably FOOM before the point that they can contribute, but this seems less and less likely as capabilities advance and we notice we’re not dead.
Replying to the unstated implication that ML-based alignment is not useful: Alignment is not a binary variable. Even if neural networks can’t be aligned in a way which robustly scales to arbitrary levels of capability, weakly aligned weakly superintelligent systems could still be useful tools as parts of research assistants (see Ought and Alignment Research Center’s work) which allow us to develop a cleaner seed AI with much better verifiability properties.
All AGI safety questions welcome (especially basic ones) [monthly thread]
Stampy feedback thread
See also the feedback form for some specific questions we’re keen to hear answers to.
Stampy has a list of some of them (and welcomes additions or corrections on the wiki entry!).
We (Rob Miles and Stampy’s team) might run an event where people can present their ideas for solving alignment with a much lower bar for entry than usual in a few months, once we’ve got a strong base of volunteer editors to absorb and filter the incoming ideas. Provisionally called “Big Wheel Of Cheese Day”. If we do this we’d likely host the database on Stampy.
What was that supplement? Seems like a useful thing to have known if reproducible.
You could skip ahead by reading the handbook directly.
I’ve partially re-ordered the first one, to make it start from a sensible place and cover key topics first.