Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.
Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn’t take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong.
The articles collected here were originally published as wiki pages with no set reading order. The LessWrong team first selected about twenty pages which seemed most engaging and valuable to us, and then ordered them[2][3] based on a mix of our own taste and feedback from some test readers that we paid to review our choices.
Current terminology would call this “misgeneralization”. Do alignment properties that hold in one context (e.g. training, while less smart) generalize to another context (deployment, much smarter)?
It’s a hard problem to build an agent which, in an intuitive sense, reasons internally as if from the developer’s external perspective – that it is incomplete, that it requires external correction, etc. This is not default behavior for an agent.
”Love” and “fun” aren’t ontologically basic components of reality. When we figure out what they’re made of, we should probably go on valuing them anyways.
If you tell a smart consequentialist mind “no murder” but it is actually trying, it will just find the next best thing that you didn’t think to disallow.
”Mild optimization” is where, if you ask your advanced AGI to paint one car pink, it just paints one car pink and then stops, rather than tiling the galaxies with pink-painted cars, because it’s not optimizing that hard. It’s okay with just painting one car pink; it isn’t driven to max out the twentieth decimal place of its car-painting score.
The property such that if you tell your AGI that you installed the wrong values in it, it lets you do something about that. An unnatural property to build into an agent.
An interactive guide to Bayes’ theorem, i.e, the law of probability governing the strength of evidence—the rule saying how much to revise our probabilities (change our minds) when we learn a new fact or observe new evidence.
An FAQ aimed at a very rapid introduction to key standard economic concepts for professionals in AI/ML who have become concerned with the potential economic impacts of their work.
Tier 2
These pages are high effort and high quality, but are less accessible and/or of less general interest than the Tier 1 pages.
The list starts with a few math pages before returning to AI alignment topics.
A dialogue between Ashley, a computer scientist who’s never heard of Solomonoff’s theory of inductive inference, and Blaine, who thinks it is the best thing since sliced bread.
Vinge’s Principle says that you (usually) can’t predict exactly what an entity smarter than you will do, because if you knew exactly what a smart agent would do, you would be at least that smart yourself. “Vingean uncertainty” is the epistemic state we enter into when we consider an agent too smart for us to predict its exact actions.
Agents which have been subject to sufficiently strong optimization pressures will tend to appear, from a human perspective, as if they obey some bounded form of the Bayesian coherence axioms for probabilistic beliefs and decision theory.
One possible scheme in AI alignment is to give the AI a state of moral uncertainty implying that we know more than the AI does about its own utility function, as the AI’s meta-utility function defines its ideal target. Then we could tell the AI, “You should let us shut you down because we know something about your ideal target that you don’t, and we estimate that we can optimize your ideal target better without you.”
It seems likely that for advanced agents, the agent’s representation of the world will change in unforeseen ways as it becomes smarter. The ontology identification problem is to create a preference framework for the agent that optimizes the same external facts, even as the agent modifies its representation of the world.
The edge instantiation problem is a hypothesized patch-resistant problem for safe value loading in advanced agent scenarios where, for most utility functions we might try to formalize or teach, the maximum of the agent’s utility function will end up lying at an edge of the solution space that is a ‘weird extreme’ from our perspective.
Goodhart’s Curse is a neologism for the combination of the Optimizer’s Curse and Goodhart’s Law, particularly as applied to the value alignment problem for Artificial Intelligences.
’Executable philosophy’ is Eliezer Yudkowsky’s term for discourse about subjects usually considered in the realm of philosophy, meant to be used for designing an Artificial Intelligence.
An AGI design should be widely separated in the design space from any design that would constitute a hyperexistential risk”. A hyperexistential risk is a “fate worse than death”.
In modern AI and especially in value alignment theory, there’s a sharp divide between “problems we know how to solve using unlimited computing power”, and “problems we can’t state how to solve using computers larger than the universe”.
Much of the current literature about value alignment centers on purported reasons to expect that certain problems will require solution, or be difficult, or be more difficult than some people seem to expect. The subject of this page’s approval rating is this practice, considered as a policy or methodology.
One counterargument to the Orthogonality Thesis asserts that agents with terminal preferences for goals like e.g. resource acquisition will always be much better at those goals than agents which merely try to acquire resources on the way to doing something else, like making paperclips. This page is a reply to that argument.
A page explaining somewhat how the rest of the pages here came to be.
Lastly, we’re sure this sequence isn’t perfect, so any feedback (which you liked/disliked/etc) is appreciated – feel free to leave comments on this page.
The pages linked here are only some of the AI alignment articles, and the selection/ordering has not been endorsed by Eliezer or MIRI. The rest of the imported Arbital content can be found via links from the pages below and also from the LessWrong Concepts page (use this link to highlight imported Arbital pages).
Eliezer’s Lost Alignment Articles / The Arbital Sequence
Note: this is a static copy of this wiki page. We are also publishing it as a post to ensure visibility.
Circa 2015-2017, a lot of high quality content was written on Arbital by Eliezer Yudkowsky, Nate Soares, Paul Christiano, and others. Perhaps because the platform didn’t take off, most of this content has not been as widely read as warranted by its quality. Fortunately, they have now been imported into LessWrong.
Most of the content written was either about AI alignment or math[1]. The Bayes Guide and Logarithm Guide are likely some of the best mathematical educational material online. Amongst the AI Alignment content are detailed and evocative explanations of alignment ideas: some well known, such as instrumental convergence and corrigibility, some lesser known like epistemic/instrumental efficiency, and some misunderstood like pivotal act.
The Sequence
The articles collected here were originally published as wiki pages with no set reading order. The LessWrong team first selected about twenty pages which seemed most engaging and valuable to us, and then ordered them[2][3] based on a mix of our own taste and feedback from some test readers that we paid to review our choices.
Tier 1
These pages are a good reading experience.
Bayes Rule Guide
Tier 2
These pages are high effort and high quality, but are less accessible and/or of less general interest than the Tier 1 pages.
The list starts with a few math pages before returning to AI alignment topics.
Lastly, we’re sure this sequence isn’t perfect, so any feedback (which you liked/disliked/etc) is appreciated – feel free to leave comments on this page.
Mathematicians were an initial target market for Arbital.
The ordering here is “Top Hits” subject to a “if you start reading at the top, you won’t be missing any major prerequisites as you read along”.
The pages linked here are only some of the AI alignment articles, and the selection/ordering has not been endorsed by Eliezer or MIRI. The rest of the imported Arbital content can be found via links from the pages below and also from the LessWrong Concepts page (use this link to highlight imported Arbital pages).