All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] How do you feel about LessWrong these days? [Open feedback thread]

Bird Concept5 Dec 2023 20:54 UTC

108 points

286 comments1 min readLW link

Critique-a-Thon of AI Alignment Plans

Iknownothing5 Dec 2023 20:50 UTC

12 points

3 comments1 min readLW link

Arguments for/against scheming that focus on the path SGD takes (Section 3 of “Scheming AIs”)

Joe Carlsmith5 Dec 2023 18:48 UTC

10 points

0 comments23 min readLW link

In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley (OpenAI post)

peterr5 Dec 2023 18:40 UTC

6 points

2 comments1 min readLW link

(pastebin.com)

Studying The Alien Mind

Quentin FEUILLADE--MONTIXI and NicholasKees

5 Dec 2023 17:27 UTC

80 points

10 comments15 min readLW link

Deep Forgetting & Unlearning for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC

127 points

30 comments13 min readLW link

On ‘Responsible Scaling Policies’ (RSPs)

Zvi5 Dec 2023 16:10 UTC

49 points

3 comments37 min readLW link

(thezvi.wordpress.com)

We’re all in this together

Tamsin Leake5 Dec 2023 13:57 UTC

69 points

65 comments2 min readLW link

A Socratic dialogue with my student

lsusr5 Dec 2023 9:31 UTC

36 points

14 comments6 min readLW link

Neural uncertainty estimation review article (for alignment)

Charlie Steiner5 Dec 2023 8:01 UTC

74 points

3 comments11 min readLW link

Analyzing the Historical Rate of Catastrophes

jsteinhardt5 Dec 2023 6:30 UTC

16 points

0 comments16 min readLW link

(bounded-regret.ghost.io)

Some open-source dictionaries and dictionary learning infrastructure

Sam Marks5 Dec 2023 6:05 UTC

46 points

7 comments5 min readLW link

The LessWrong 2022 Review

habryka5 Dec 2023 4:00 UTC

115 points

43 comments4 min readLW link

Bands And Low-stakes Dances

jefftk5 Dec 2023 3:50 UTC

20 points

0 comments1 min readLW link

(www.jefftk.com)

Accelerating science through evolvable institutions

jasoncrawford4 Dec 2023 23:21 UTC

19 points

9 comments6 min readLW link

(rootsofprogress.org)

Speaking to Congressional staffers about AI risk

Orpheus16 and hath

4 Dec 2023 23:08 UTC

312 points

25 comments15 min readLW link 1 review

Open Thread – Winter 2023/2024

habryka4 Dec 2023 22:59 UTC

35 points

160 comments1 min readLW link

Interview with Vanessa Kosoy on the Value of Theoretical Research for AI

WillPetillo4 Dec 2023 22:58 UTC

37 points

0 comments35 min readLW link

2023 Alignment Research Updates from FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 22:32 UTC

18 points

0 comments8 min readLW link

(far.ai)

What’s new at FAR AI

AdamGleave and EuanMcLean

4 Dec 2023 21:18 UTC

41 points

0 comments5 min readLW link

(far.ai)

n of m ring signatures

DanielFilan4 Dec 2023 20:00 UTC

51 points

7 comments1 min readLW link

(danielfilan.com)

Mechanistic interpretability through clustering

Alistair Fraser4 Dec 2023 18:49 UTC

1 point

0 comments1 min readLW link

Agents which are EU-maximizing as a group are not EU-maximizing individually

Mlxa4 Dec 2023 18:49 UTC

3 points

2 comments2 min readLW link

Planning in LLMs: Insights from AlphaGo

jco4 Dec 2023 18:48 UTC

8 points

10 comments11 min readLW link

Non-classic stories about scheming (Section 2.3.2 of “Scheming AIs”)

Joe Carlsmith4 Dec 2023 18:44 UTC

9 points

0 comments20 min readLW link

6. The Mutable Values Problem in Value Learning and CEV

RogerDearnaley4 Dec 2023 18:31 UTC

12 points

0 comments49 min readLW link

Updates to Open Phil’s career development and transition funding program

abergal and Bastian Stern

4 Dec 2023 18:10 UTC

28 points

0 comments2 min readLW link

[Valence series] 1. Introduction

Steven Byrnes4 Dec 2023 15:40 UTC

99 points

16 comments16 min readLW link 2 reviews

South Bay Meetup 12/9

David Friedman4 Dec 2023 7:32 UTC

2 points

0 comments1 min readLW link

Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation

Paul Bricman4 Dec 2023 7:31 UTC

12 points

6 comments16 min readLW link

(arxiv.org)

A call for a quantitative report card for AI bioterrorism threat models

Juno4 Dec 2023 6:35 UTC

12 points

0 comments10 min readLW link

FTL travel summary

Isaac King4 Dec 2023 5:17 UTC

1 point

3 comments3 min readLW link

Disappointing Table Refinishing

jefftk4 Dec 2023 2:50 UTC

14 points

3 comments1 min readLW link

(www.jefftk.com)

the micro-fulfillment cambrian explosion

bhauth4 Dec 2023 1:15 UTC

54 points

5 comments4 min readLW link

(www.bhauth.com)

Nietzsche’s Morality in Plain English

Arjun Panickssery4 Dec 2023 0:57 UTC

93 points

14 comments4 min readLW link 1 review

(arjunpanickssery.substack.com)

Meditations on Mot

Richard_Ngo4 Dec 2023 0:19 UTC

56 points

11 comments8 min readLW link

(www.mindthefuture.info)

The Witness

Richard_Ngo3 Dec 2023 22:27 UTC

106 points

5 comments14 min readLW link

(www.narrativeark.xyz)

Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of “Scheming AIs”)

Joe Carlsmith3 Dec 2023 18:32 UTC

9 points

0 comments17 min readLW link

[Question] How do you do post mortems?

matto3 Dec 2023 14:46 UTC

9 points

2 comments1 min readLW link

The benefits and risks of optimism (about AI safety)

Karl von Wendt3 Dec 2023 12:45 UTC

−7 points

6 comments5 min readLW link

Book Review: 1948 by Benny Morris

Yair Halberstadt3 Dec 2023 10:29 UTC

41 points

9 comments12 min readLW link

Quick takes on “AI is easy to control”

So8res2 Dec 2023 22:31 UTC

26 points

49 comments4 min readLW link

The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)

Joe Carlsmith2 Dec 2023 15:20 UTC

8 points

1 comment15 min readLW link

The Method of Loci: With some brief remarks, including transformers and evaluating AIs

Bill Benzon2 Dec 2023 14:36 UTC

6 points

0 comments3 min readLW link

Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition

Adrià Moret2 Dec 2023 14:07 UTC

26 points

31 comments42 min readLW link

Out-of-distribution Bioattacks

jefftk2 Dec 2023 12:20 UTC

66 points

15 comments2 min readLW link

(www.jefftk.com)

After Alignment — Dialogue between RogerDearnaley and Seth Herd

RogerDearnaley and Seth Herd

2 Dec 2023 6:03 UTC

15 points

2 comments25 min readLW link

List of strategies for mitigating deceptive alignment

joshc2 Dec 2023 5:56 UTC

40 points

2 comments6 min readLW link

[Question] What is known about invariants in self-modifying systems?

mishka2 Dec 2023 5:04 UTC

9 points

2 comments1 min readLW link

2023 Unofficial LessWrong Census/Survey

Screwtape2 Dec 2023 4:41 UTC

169 points

81 comments1 min readLW link