All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Why deceptive alignment matters for AGI safety

Marius HobbhahnSep 15, 2022, 1:38 PM

68 points

13 comments13 min readLW link

Path dependence in ML inductive biases

Vivek Hebbar and evhub

Sep 10, 2022, 1:38 AM

68 points

13 comments10 min readLW link

Self-Control Secrets of the Puritan Masters

David Hugh-JonesSep 26, 2022, 9:04 AM

67 points

3 comments5 min readLW link

(wyclif.substack.com)

Quintin’s alignment papers roundup—week 2

Quintin PopeSep 19, 2022, 1:41 PM

67 points

2 comments10 min readLW link

LOVE in a simbox is all you need

jacob_cannellSep 28, 2022, 6:25 PM

66 points

73 comments44 min readLW link 1 review

Where I currently disagree with Ryan Greenblatt’s version of the ELK approach

So8resSep 29, 2022, 9:18 PM

65 points

7 comments5 min readLW link

Book review: “The Heart of the Brain: The Hypothalamus and Its Hormones”

Steven ByrnesSep 27, 2022, 1:20 PM

65 points

3 comments18 min readLW link

A game of mattering

KatjaGraceSep 23, 2022, 2:30 AM

64 points

7 comments5 min readLW link

(worldspiritsockpuppet.com)

Clarifying the Agent-Like Structure Problem

johnswentworthSep 29, 2022, 9:28 PM

63 points

17 comments6 min readLW link

[Closed] Prize and fast track to alignment research at ALTER

Vanessa KosoySep 17, 2022, 4:58 PM

63 points

8 comments3 min readLW link

Private alignment research sharing and coordination

porbySep 4, 2022, 12:01 AM

62 points

13 comments5 min readLW link

Infra-Exercises, Part 1

Diffractor, Jack Parker and Connall Garrod

Sep 1, 2022, 5:06 AM

62 points

10 comments1 min readLW link

Review of Examine.com’s vitamin write-ups

Elizabeth and Martin Bernstorff

Sep 26, 2022, 11:40 PM

60 points

1 comment5 min readLW link

(acesounderglass.com)

Gradient Hacker Design Principles From Biology

johnswentworthSep 1, 2022, 7:03 PM

60 points

13 comments3 min readLW link

Fake qualities of mind

Kaj_SotalaSep 22, 2022, 4:40 PM

59 points

2 comments2 min readLW link

(kajsotala.fi)

Argument against 20% GDP growth from AI within 10 years [Linkpost]

aogSep 12, 2022, 4:08 AM

59 points

20 comments5 min readLW link

(twitter.com)

Replacement for PONR concept

Daniel KokotajloSep 2, 2022, 12:09 AM

59 points

6 comments2 min readLW link

Levelling Up in AI Safety Research Engineering

Gabe MSep 2, 2022, 4:59 AM

58 points

9 comments17 min readLW link

QAPR 3: interpretability-guided training of neural nets

Quintin PopeSep 28, 2022, 4:02 PM

58 points

2 comments10 min readLW link

Deep Q-Networks Explained

Jay BaileySep 13, 2022, 12:01 PM

58 points

8 comments20 min readLW link

Two reasons we might be closer to solving alignment than it seems

KatWoods and AmberDawn

Sep 24, 2022, 8:00 PM

57 points

9 comments4 min readLW link

Why was progress so slow in the past?

jasoncrawfordSep 1, 2022, 8:26 PM

54 points

31 comments6 min readLW link

(rootsofprogress.org)

Methodological Therapy: An Agenda For Tackling Research Bottlenecks

adamShimi, Lucas Teixeira and remember

Sep 22, 2022, 6:41 PM

54 points

6 comments9 min readLW link

We may be able to see sharp left turns coming

Ethan Perez and Neel Nanda

Sep 3, 2022, 2:55 AM

54 points

29 comments1 min readLW link

First we shape our social graph; then it shapes us

Henrik KarlssonSep 7, 2022, 3:50 PM

53 points

6 comments8 min readLW link

(escapingflatland.substack.com)

Many therapy schools work with inner multiplicity (not just IFS)

David Althaus and Ewelina Tur

Sep 17, 2022, 10:27 AM

52 points

16 comments18 min readLW link

Coordinate-Free Interpretability Theory

johnswentworthSep 14, 2022, 11:33 PM

52 points

17 comments5 min readLW link

ACT-1: Transformer for Actions

Daniel KokotajloSep 14, 2022, 7:09 PM

52 points

4 comments1 min readLW link

(www.adept.ai)

When does technical work to reduce AGI conflict make a difference?: Introduction

JesseClifton, Sammy Martin and Anthony DiGiovanni

Sep 14, 2022, 7:38 PM

52 points

3 comments6 min readLW link

When would AGIs engage in conflict?

JesseClifton, Sammy Martin and Anthony DiGiovanni

Sep 14, 2022, 7:38 PM

52 points

5 comments13 min readLW link

EA & LW Forums Weekly Summary (28 Aug − 3 Sep 22’)

Zoe WilliamsSep 6, 2022, 11:06 AM

51 points

2 comments14 min readLW link

My Thoughts on the ML Safety Course

zeshenSep 27, 2022, 1:15 PM

50 points

3 comments17 min readLW link

Some notes on solving hard problems

Joe RoccaSep 19, 2022, 12:58 PM

50 points

8 comments29 min readLW link

Dan Luu on Futurist Predictions

RobertMSep 14, 2022, 3:01 AM

50 points

9 comments5 min readLW link

(danluu.com)

Soft skills for meetups

mingyuanSep 27, 2022, 5:26 PM

49 points

3 comments5 min readLW link

Brief Notes on Transformers

Adam JermynSep 26, 2022, 2:46 PM

48 points

3 comments2 min readLW link

Understanding and avoiding value drift

TurnTroutSep 9, 2022, 4:16 AM

48 points

14 comments6 min readLW link

Covid 9/29/22: The Jones Act Waver

ZviSep 29, 2022, 6:20 PM

47 points

10 comments24 min readLW link

(thezvi.wordpress.com)

A Library and Tutorial for Factored Cognition with Language Models

stuhlmueller, justin_dan and goodgravy

Sep 28, 2022, 6:15 PM

47 points

0 comments1 min readLW link

Scraping training data for your mind

Henrik KarlssonSep 21, 2022, 4:27 PM

47 points

4 comments8 min readLW link

(escapingflatland.substack.com)

Estimating the Current and Future Number of AI Safety Researchers

Stephen McAleeseSep 28, 2022, 9:11 PM

47 points

14 comments9 min readLW link

(forum.effectivealtruism.org)

Prize idea: Transmit MIRI and Eliezer’s worldviews

eliflandSep 19, 2022, 9:21 PM

47 points

18 comments2 min readLW link

[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]

David Scott Krueger (formerly: capybaralet)Sep 8, 2022, 10:28 PM

47 points

1 comment5 min readLW link

Pretending not to Notice

jefftkSep 19, 2022, 2:30 AM

46 points

12 comments2 min readLW link

(www.jefftk.com)

AI Safety field-building projects I’d like to see

Orpheus16Sep 11, 2022, 11:43 PM

46 points

8 comments6 min readLW link

AI Risk Intro 1: Advanced AI Might Be Very Bad

CallumMcDougall and L Rudolf L

Sep 11, 2022, 10:57 AM

46 points

13 comments30 min readLW link

Alignment via prosocial brain algorithms

Cameron BergSep 12, 2022, 1:48 PM

45 points

30 comments6 min readLW link

It matters when the first sharp left turn happens

Adam JermynSep 29, 2022, 8:12 PM

45 points

9 comments4 min readLW link

Samotsvety’s AI risk forecasts

eliflandSep 9, 2022, 4:01 AM

44 points

0 comments4 min readLW link

Searching for Modularity in Large Language Models

NickyP and Stephen Fowler

Sep 8, 2022, 2:25 AM

44 points

3 comments14 min readLW link