All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Speaking to Congressional staffers about AI risk

Orpheus16 and hath

Dec 4, 2023, 11:08 PM

312 points

25 comments15 min readLW link 1 review

Open Thread – Winter 2023/2024

habrykaDec 4, 2023, 10:59 PM

35 points

160 comments1 min readLW link

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetilloDec 4, 2023, 10:58 PM

37 points

0 comments35 min readLW link

2023 Alignment Research Updates from FAR AI

AdamGleave and EuanMcLean

Dec 4, 2023, 10:32 PM

18 points

0 comments8 min readLW link

(far.ai)

What’s new at FAR AI

AdamGleave and EuanMcLean

Dec 4, 2023, 9:18 PM

41 points

0 comments5 min readLW link

(far.ai)

n of m ring signatures

DanielFilanDec 4, 2023, 8:00 PM

51 points

7 comments1 min readLW link

(danielfilan.com)

Mechanistic interpretability through clustering

Alistair FraserDec 4, 2023, 6:49 PM

1 point

0 comments1 min readLW link

Agents which are EU-maximizing as a group are not EU-maximizing individually

MlxaDec 4, 2023, 6:49 PM

3 points

2 comments2 min readLW link

Planning in LLMs: Insights from AlphaGo

jcoDec 4, 2023, 6:48 PM

8 points

10 comments11 min readLW link

Non-classic stories about scheming (Section 2.3.2 of “Scheming AIs”)

Joe CarlsmithDec 4, 2023, 6:44 PM

9 points

0 comments20 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaleyDec 4, 2023, 6:31 PM

12 points

0 comments49 min readLW link

Updates to Open Phil’s career development and transition funding program

abergal and Bastian Stern

Dec 4, 2023, 6:10 PM

28 points

0 comments2 min readLW link

[Valence series] 1. Introduction

Steven ByrnesDec 4, 2023, 3:40 PM

99 points

16 comments16 min readLW link 2 reviews

South Bay Meetup 12/9

David FriedmanDec 4, 2023, 7:32 AM

2 points

0 comments1 min readLW link

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

Paul BricmanDec 4, 2023, 7:31 AM

12 points

6 comments16 min readLW link

(arxiv.org)

A call for a quantitative report card for AI bioterrorism threat models

JunoDec 4, 2023, 6:35 AM

12 points

0 comments10 min readLW link

FTL travel summary

Isaac KingDec 4, 2023, 5:17 AM

1 point

3 comments3 min readLW link

Disappointing Table Refinishing

jefftkDec 4, 2023, 2:50 AM

14 points

3 comments1 min readLW link

(www.jefftk.com)

the micro-fulfillment cambrian explosion

bhauthDec 4, 2023, 1:15 AM

54 points

5 comments4 min readLW link

(www.bhauth.com)

Nietzsche’s Morality in Plain English

Arjun PanicksseryDec 4, 2023, 12:57 AM

92 points

14 comments4 min readLW link 1 review

(arjunpanickssery.substack.com)

Meditations on Mot

Richard_NgoDec 4, 2023, 12:19 AM

56 points

11 comments8 min readLW link

(www.mindthefuture.info)

The Witness

Richard_NgoDec 3, 2023, 10:27 PM

105 points

5 comments14 min readLW link

(www.narrativeark.xyz)

Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of “Scheming AIs”)

Joe CarlsmithDec 3, 2023, 6:32 PM

9 points

0 comments17 min readLW link

[Question] How do you do post mortems?

mattoDec 3, 2023, 2:46 PM

9 points

2 comments1 min readLW link

The benefits and risks of optimism (about AI safety)

Karl von WendtDec 3, 2023, 12:45 PM

−7 points

6 comments5 min readLW link

Book Review: 1948 by Benny Morris

Yair HalberstadtDec 3, 2023, 10:29 AM

41 points

9 comments12 min readLW link

Quick takes on “AI is easy to control”

So8resDec 2, 2023, 10:31 PM

26 points

49 comments4 min readLW link

The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)

Joe CarlsmithDec 2, 2023, 3:20 PM

8 points

1 comment15 min readLW link

The Method of Loci: With some brief remarks, including transformers and evaluating AIs

Bill Benzon2 Dec 2023 14:36 UTC

6 points

0 comments3 min readLW link

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

Adrià Moret2 Dec 2023 14:07 UTC

26 points

31 comments42 min readLW link

Out-of-distribution Bioattacks

jefftk2 Dec 2023 12:20 UTC

66 points

15 comments2 min readLW link

(www.jefftk.com)

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

List of strategies for mitigating deceptive alignment

joshc2 Dec 2023 5:56 UTC

38 points

2 comments6 min readLW link

[Question] What is known about invariants in self-modifying systems?

mishka2 Dec 2023 5:04 UTC

9 points

2 comments1 min readLW link

2023 Unofficial LessWrong Census/Survey

Screwtape2 Dec 2023 4:41 UTC

169 points

81 comments1 min readLW link

Protecting against sudden capability jumps during training

Nikola Jurkovic2 Dec 2023 4:22 UTC

15 points

2 comments2 min readLW link

South Bay Pre-Holiday Gathering

IS2 Dec 2023 3:21 UTC

10 points

2 comments1 min readLW link

MATS Summer 2023 Retrospective

utilistrutil, Juan Gil, Ryan Kidd, Christian Smith, McKennaFitzgerald and LauraVaughan

1 Dec 2023 23:29 UTC

77 points

34 comments26 min readLW link

Complex systems research as a field (and its relevance to AI Alignment)

Nora_Ammann and habryka

1 Dec 2023 22:10 UTC

65 points

11 comments19 min readLW link

[Question] Could there be “natural impact regularization” or “impact regularization by default”?

tailcalled1 Dec 2023 22:01 UTC

24 points

6 comments1 min readLW link

Benchmarking Bowtie2 Threading

jefftk1 Dec 2023 20:20 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Please Bet On My Quantified Self Decision Markets

niplav1 Dec 2023 20:07 UTC

36 points

6 comments6 min readLW link

Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]

Writer1 Dec 2023 19:30 UTC

19 points

0 comments5 min readLW link

(youtu.be)

Carving up problems at their joints

Jakub Smékal1 Dec 2023 18:48 UTC

1 point

0 comments2 min readLW link

(jakubsmekal.com)

Queuing theory: Benefits of operating at 60% capacity

ampdot1 Dec 2023 18:48 UTC

43 points

4 comments1 min readLW link

(less.works)

Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)

ampdot1 Dec 2023 18:48 UTC

14 points

0 comments1 min readLW link

(airtable.com)

Kolmogorov Complexity Lays Bare the Soul

jakej1 Dec 2023 18:29 UTC

5 points

8 comments2 min readLW link

Thoughts on “AI is easy to control” by Pope & Belrose

Steven Byrnes1 Dec 2023 17:30 UTC

197 points

63 comments14 min readLW link 1 review

Why Did NEPA Peak in 2016?

Maxwell Tabarrok1 Dec 2023 16:18 UTC

10 points

0 comments3 min readLW link

(maximumprogress.substack.com)

Worlds where I wouldn’t worry about AI risk

adekcz1 Dec 2023 16:06 UTC

2 points

0 comments4 min readLW link