Trust
Rule Thinkers In, Not Out
Scott Alexander
Gears vs Behavior
John S. Wentworth
Book Review: The Secret Of Our Success
Reason isn’t magic
Ben Hoffman
“Other people are wrong” vs “I am right”
Buck Shlegeris
In My Culture
Duncan Sabien
Chris Olah’s views on AGI safety
Evan Hubinger
Understanding “Deep Double Descent”
How to Ignore Your Emotions (while also thinking you’re awesome at emotions)
Hazard
Paper-Reading for Gears
Book summary: Unlocking the Emotional Brain
Kaj Sotala
Noticing Frame Differences
Raymond Arnold
Propagating Facts into Aesthetics
Do you fear the rock or the hard place?
Ruben Bloom
Mental Mountains
Steelmanning Divination
Vaniver
Modularity
Book Review: Design Principles of Biological Circuits
Reframing Superintelligence: Comprehensive AI Services as General Intelligence
Rohin M. Shah
Building up to an Internal Family Systems model
Being the (Pareto) Best in the World
The Schelling Choice is “Rabbit”, not “Stag”
Literature Review: Distributed Teams
Elizabeth Van Nostrand
Gears-Level Models are Capital Investments
Evolution of Modularity
You Have About Five Words
Coherent decisions imply consistent utilities
Eliezer Yudkowsky
Alignment Research Field Guide
Abram Demski
Forum participation as a research strategy
Wei Dai
The Credit Assignment Problem
Selection vs Control
Incentives
Asymmetric Justice
Zvi Mowshowitz
The Copenhagen Interpretation of Ethics
Jai Dhyani
Unconscious Economics
Jacob Lagerros
Power Buys You Distance From The Crime
Seeking Power is Often Convergently Instrumental in MDPs
Alexander Turner & Logan Smith
Yes Requires the Possibility of No
Scott Garrabrant
Mistakes with Conservation of Expected Evidence
Heads I Win,Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green Rationalists
Zack M. Davis
Excerpts from a larger discussion about simulacra
Moloch Hasn’t Won
Integrity and accountability are core parts of rationality
Oliver Habryka
The Real Rules Have No Exceptions
Said Achmiz
Simple Rules of Law
The Amish, and Strategic Norms around Technology
Risks from Learned Optimization: Introduction
Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott Garrabrant
Gradient hacking
Failure
The Parable of Predict-O-Matic
Blackmail
Bioinfohazards
Megan Crawford, Finan Adamson, & Jeffrey Ladish
What failure looks like
Paul Christiano
AI Safety “Success Stories”
Reframing Impact
Alexander Turner
The strategy-stealing assumption
Is Rationalist Self-Improvement Real?
Jacob Falkovich
The Curse Of The Counterfactual
P.J. Eby
human psycholinguists: a critical appraisal
Nostalgebraist
Why wasn’t science invented in China?
Make more land
Jeff Kaufman
Rest Days vs Zombie Days
Lauren Lee
Here is a google sheet.
TrustRule Thinkers In, Not OutScott AlexanderGears vs BehaviorJohn S. WentworthBook Review: The Secret Of Our SuccessScott AlexanderReason isn’t magicBen Hoffman“Other people are wrong” vs “I am right”Buck ShlegerisIn My CultureDuncan SabienChris Olah’s views on AGI safetyEvan HubingerUnderstanding “Deep Double Descent”Evan HubingerHow to Ignore Your Emotions (while also thinking you’re awesome at emotions)HazardPaper-Reading for GearsJohn S. WentworthBook summary: Unlocking the Emotional BrainKaj SotalaNoticing Frame DifferencesRaymond ArnoldPropagating Facts into AestheticsRaymond ArnoldDo you fear the rock or the hard place?Ruben BloomMental MountainsScott AlexanderSteelmanning DivinationVaniverModularityBook Review: Design Principles of Biological CircuitsJohn S. WentworthReframing Superintelligence: Comprehensive AI Services as General IntelligenceRohin M. ShahBuilding up to an Internal Family Systems modelKaj SotalaBeing the (Pareto) Best in the WorldJohn S. WentworthThe Schelling Choice is “Rabbit”, not “Stag”Raymond ArnoldLiterature Review: Distributed TeamsElizabeth Van NostrandGears-Level Models are Capital InvestmentsJohn S. WentworthEvolution of ModularityJohn S. WentworthYou Have About Five WordsRaymond ArnoldCoherent decisions imply consistent utilitiesEliezer YudkowskyAlignment Research Field GuideAbram DemskiForum participation as a research strategyWei DaiThe Credit Assignment ProblemAbram DemskiSelection vs ControlAbram DemskiIncentivesAsymmetric JusticeZvi MowshowitzThe Copenhagen Interpretation of EthicsJai DhyaniUnconscious EconomicsJacob LagerrosPower Buys You Distance From The CrimeElizabeth Van NostrandSeeking Power is Often Convergently Instrumental in MDPsAlexander Turner & Logan SmithYes Requires the Possibility of NoScott GarrabrantMistakes with Conservation of Expected EvidenceAbram DemskiHeads I Win,Tails?—Never Heard of Her; Or, Selective Reporting and the Tragedy of the Green RationalistsZack M. DavisExcerpts from a larger discussion about simulacraBen HoffmanMoloch Hasn’t WonZvi MowshowitzIntegrity and accountability are core parts of rationalityOliver HabrykaThe Real Rules Have No ExceptionsSaid AchmizSimple Rules of LawZvi MowshowitzThe Amish, and Strategic Norms around TechnologyRaymond ArnoldRisks from Learned Optimization: IntroductionEvan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, & Scott GarrabrantGradient hackingEvan HubingerFailureThe Parable of Predict-O-MaticAbram DemskiBlackmailZvi MowshowitzBioinfohazardsMegan Crawford, Finan Adamson, & Jeffrey LadishWhat failure looks likePaul ChristianoSeeking Power is Often Convergently Instrumental in MDPsAlexander Turner & Logan SmithAI Safety “Success Stories”Wei DaiReframing ImpactAlexander TurnerThe strategy-stealing assumptionPaul ChristianoIs Rationalist Self-Improvement Real?Jacob FalkovichThe Curse Of The CounterfactualP.J. Ebyhuman psycholinguists: a critical appraisalNostalgebraistWhy wasn’t science invented in China?Ruben BloomMake more landJeff KaufmanRest Days vs Zombie DaysLauren LeeHere is a google sheet.