Statistical Mechanics

Tarbell Fellowship

AI Control

Adverse Selection

Provable AI Alignment (ProvAIA)

Weak-To-Strong Generalization (W2SG)

Sëbus: A Book About Happiness

On Wholesomeness

Deliberative Algorithms as Scaffolding

Counterfactuals and Updatelessness

Quantitative cruxes and evidence in Alignment

Distributed Strategic Epistemology

Delegative Decision Theory

Formalising Catastrophic Goodhart

From Big Ideas To Real-World Results

Rapid Coordination Field Manual

The Ethicophysics

Exploring the Digital Wildnerness

Game Theory without Argmax

AI Manipulation Is Already Here

The Value Change Problem (sequence)

Large Language Model Psychology

Monthly Algorithmic Problems in Mech Interp

Crowdsourced knowledge base

Waterloo Rationality Meetups Showcase

Consciousness Discourse

An Opinionated Guide to Computability and Complexity

Narrative Theory

Developmental Interpretability

Meta-rationality

Catastrophic Risks From AI

Distilling Singular Learning Theory

Towards Causal Foundations of Safe AGI

CAIS Philosophy Fellowship Midpoint Deliverables

Machine Learning For Scientific Discovery

Replacing fear

The Nuts and Bolts of Naturalism

Interpreting a Maze-Solving Network

From Atoms To Agents

Interpreting Othello-GPT

Singularity now: is GPT-4 trying to takeover the world?

I’m pretty sure AI is as dumb as We Are

Untitled Novel

A Data-Driven Path to Effortless Weightloss: Potatoes, Potassium, Drugs, Chocolate, and much much more

The Shallow Reality of ‘Deep Learning Theory’

Leveling Up: advice & resources for junior alignment researchers

On Becoming a Great Alignment Researcher (Efficiently)

Simulators

Simulator seminar sequence

Bias in Evaluating AGI X-Risks

Developments toward Uncontrollable AI

Why Not Try Build Safe AGI?

Alignment Stream of Thought

Some comments on the CAIS paradigm

(Lawrence’s) Reflections on Research

Entropy from first principles

[Redwood Research] Causal Scrubbing

Generalised models

Experiments in instrumental convergence

Research Journals

Hypothesis Subspace

Thoughts in Philosophy of Science of AI Alignment

“Why Not Just...”

Law-Following AI

My AI Risk Model

Shard Theory

AGI-assisted Alignment

Alignment For Foxes

Selection Theorems: Modularity

Breaking Down Goal-Directed Behaviour

A Tour of AI Timelines

Math Upskilling Notes

Networking: A Game Manual

Basic Foundations for Agent Models

Pragmatic AI Safety

Insights from Dath Ilan

An Inside View of AI Alignment

AI Races and Macrostrategy

Treacherous Turn

The Inside View (Podcast)

Winding My Way Through Alignment

Interpretability Research for the Most Important Century

Neural Networks, More than you wanted to Show

Concept Extrapolation

Calculus in Game and Decision Theory

Alignment Stream of Thought

Civilization & Cooperation

Trends in Machine Learning

Intuitive Introduction to Functional Decision Theory

Intro to Brain-Like-AGI Safety

Mechanics of Tradecraft

Independent AI Research

Agency: What it is and why it matters

Thoughts on Corrigibility

Epistemic Cookbook for Alignment

Transformative AI and Compute

AI Safety Subprojects

The Coordination Frontier

D&D.Sci

The Most Important Century

Framing Practicum

Rationality in Research

AI Defense in Depth: A Layman’s Guide

Modeling Transformative AI Risk (MTAIR)

Practical Guide to Anthropics

The Causes of Power-seeking and Instrumental Convergence

2021 Less Wrong Darwin Game

Finite Factored Sets

Comprehensive Information Gatherings

Using Credence Calibration for Everything

Anthropic Decision Theory

Reviews for the Alignment Forum

Notes on Virtues

Participating in a Covid-19 Vaccination Trial

Predictions & Self-awareness

Pointing at Normativity

Counterfactual Planning

AI Alignment Unwrapped

AI Timelines

Pseudorandomness Contest

Bayeswatch

Cryonics Signup Guide

NLP and other Self-Improvement

Takeoff and Takeover in the Past and Future

Forecasting Newsletter

Sunzi’s《Methods of War》

COVID-19 Updates and Analysis

Deconfusing Goal-Directedness

The Grueling Subject

2020 Less Wrong Darwin Game

Quantitative Finance

Factored Cognition

Zen and Rationality

Privacy Practices

Staying Sane While Taking Ideas Seriously

Naturalized Induction

What You Can and Can’t Learn from Games

Short Stories

Toying With Goal-Directedness

Against Rationalization II

Consequences of Logical Induction

Through the Haskell Jungle

Lessons from Isaac

Filk

Subagents and impact measures

The LessWrong Review

If I were a well-intentioned AI...

Moral uncertainty

Medical Paradigms

Understanding Machine Learning

Antimemetics

Gears of Aging

Map and Territory Cross-Posts

Phenomenological AI Alignment

Changing your Mind With Memory Reconsolidation

base-line to enlightenment—the physical route to better

Partial Agency

Concept Safety

AI Alignment Writing Day 2019

Novum Organum

Logical Counterfactuals and Proposition graphs

AI Alignment Writing Day 2018

Daily Insights

Model Comparison

Reframing Impact

Alternate Alignment Ideas

Concepts in formal epistemology

So You Want To Colonize The Universe

Mechanism Design

Decision Analysis

Priming

Positivism and Self Deception

Kickstarter for Coordinated Action

Prediction-Driven Collaborative Reasoning Systems

Assorted Maths

Multiagent Models of Mind

Open Threads

Keith Stanovich: What Intelligence Tests Miss

Filtered Evidence, Filtered Arguments

CDT=EDT?

Fixed Points

Metaethics

Quantum Physics

Ethical Injunctions

Alignment Newsletter

Share Models, Not Beliefs

Voting Theory Primer for Rationalists

Becoming Stronger

Hufflepuff Cynicism

Tensions in Truthseeking

Murphy’s Quest

Project Hufflepuff

Instrumental Rationality

Philosophy Corner

Rational Ritual

The Darwin Game

Drawing Less Wrong