Alignment

This fifth book is about alignment, the problem of aligning the thoughts and goals of artificial intelligences with those of humans.

This relates to the art of human rationality in two ways.

First, the laws of reasoning and decision-making apply similarly to all agents, whether they are human or artificial. Many of the deepest insights about rationality at LessWrong have come directly from users’ attempts to grapple with the problem of understanding and aligning artificial intelligences.

Second, the design of AI is a topic that a rational agent in the present day will have a natural interest in. This technology is of great leverage at this point in history – that is, the period in which we are actively developing it. In the words of I.J. Good “the first ultraintelligent machine is the last invention that man need ever make”.

For the reader who is not well-versed in the discussion around how to align powerful AI systems, treat the essays in this book as a collection of letters between scientists in a field you are not part of. While there will be terms that are not fully explained, authors will attempt to speak plainly, make honest efforts to convey the structure of their arguments, and to convey the most abstract and philosophical ideas, hand-drawn cartoons will be their mode of communication.

Ar­gu­ments about fast takeoff

Speci­fi­ca­tion gam­ing ex­am­ples in AI

The Rocket Align­ment Problem

Embed­ded Agents

Paul’s re­search agenda FAQ

Challenges to Chris­ti­ano’s ca­pa­bil­ity am­plifi­ca­tion proposal

Ro­bust­ness to Scale

Co­her­ence ar­gu­ments do not en­tail goal-di­rected behavior