All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Announcing Athena—Women in AI Alignment Research

Claire ShortNov 7, 2023, 9:46 PM

80 points

2 comments3 min readLW link

Thomas Kwa’s research journal

Thomas Kwa and Adrià Garriga-alonso

Nov 23, 2023, 5:11 AM

79 points

1 comment6 min readLW link

Spaciousness In Partner Dance: A Naturalism Demo

LoganStrohlNov 19, 2023, 7:00 AM

78 points

6 comments19 min readLW link 1 review

Reactions to the Executive Order

ZviNov 1, 2023, 8:40 PM

77 points

4 comments29 min readLW link

(thezvi.wordpress.com)

Lying Alignment Chart

Zack_M_DavisNov 29, 2023, 4:15 PM

77 points

17 comments1 min readLW link

Anthropic Fall 2023 Debate Progress Update

Ansh RadhakrishnanNov 28, 2023, 5:37 AM

76 points

9 comments12 min readLW link

Interpretability with Sparse Autoencoders (Colab exercises)

CallumMcDougallNov 29, 2023, 12:56 PM

76 points

9 comments4 min readLW link

 Are language models good at making predictions?

dynomightNov 6, 2023, 1:10 PM

76 points

14 comments4 min readLW link

(dynomight.net)

On the UK Summit

ZviNov 7, 2023, 1:10 PM

74 points

6 comments30 min readLW link

(thezvi.wordpress.com)

Announcing New Beginner-friendly Book on AI Safety and Risk

Darren McKeeNov 25, 2023, 3:57 PM

74 points

3 comments LW link

Dialogue on the Claim: “OpenAI’s Firing of Sam Altman (And Shortly-Subsequent Events) On Net Reduced Existential Risk From AGI”

johnswentworth and Ruby

Nov 21, 2023, 5:39 PM

73 points

84 comments11 min readLW link

Testbed evals: evaluating AI safety even when it can’t be directly measured

joshcNov 15, 2023, 7:00 PM

71 points

2 comments4 min readLW link

A to Z of things

KatjaGraceNov 17, 2023, 5:20 AM

71 points

8 comments1 min readLW link 1 review

(worldspiritsockpuppet.com)

Reinforcement Via Giving People Cookies

ScrewtapeNov 15, 2023, 4:34 AM

70 points

9 comments6 min readLW link

Game Theory without Argmax [Part 1]

Cleo NardoNov 11, 2023, 3:59 PM

70 points

18 comments19 min readLW link

Why not electric trains and excavators?

bhauthNov 21, 2023, 12:07 AM

68 points

39 comments5 min readLW link

(www.bhauth.com)

Alignment can improve generalisation through more robustly doing what a human wants—CoinRun example

Stuart_ArmstrongNov 21, 2023, 11:41 AM

67 points

9 comments3 min readLW link

AI #39: The Week of OpenAI

ZviNov 23, 2023, 3:10 PM

67 points

8 comments28 min readLW link

(thezvi.wordpress.com)

Black Box Biology

GeneSmithNov 29, 2023, 2:27 AM

65 points

30 comments2 min readLW link

How to Control an LLM’s Behavior (why my P(DOOM) went down)

RogerDearnaleyNov 28, 2023, 7:56 PM

65 points

30 comments11 min readLW link

“Epistemic range of motion” and LessWrong moderation

habryka and Gabriel Alfour

Nov 27, 2023, 9:58 PM

65 points

3 comments12 min readLW link

A free to enter, 240 character, open-source iterated prisoner’s dilemma tournament

Isaac KingNov 9, 2023, 8:24 AM

64 points

19 comments1 min readLW link

(manifold.markets)

Thoughts on open source AI

Sam MarksNov 3, 2023, 3:35 PM

62 points

17 comments10 min readLW link

Paper out now on creatine and cognitive performance

FabienneNov 26, 2023, 10:58 AM

61 points

2 comments1 min readLW link

Raemon’s Deliberate (“Purposeful?”) Practice Club

Raemon, Elizabeth, lynettebye and Alex_Altair

Nov 14, 2023, 6:24 PM

61 points

11 comments22 min readLW link

Vote on worthwhile OpenAI topics to discuss

Ben Pace and Bird Concept

Nov 21, 2023, 12:03 AM

61 points

55 comments1 min readLW link

New paper shows truthfulness & instruction-following don’t generalize by default

joshcNov 19, 2023, 7:27 PM

60 points

0 comments4 min readLW link

On OpenAI Dev Day

ZviNov 9, 2023, 4:10 PM

60 points

0 comments15 min readLW link

(thezvi.wordpress.com)

Sam Altman, Greg Brockman and others from OpenAI join Microsoft

OzyrusNov 20, 2023, 8:23 AM

58 points

15 comments1 min readLW link

(twitter.com)

Genetic fitness is a measure of selection strength, not the selection target

Kaj_SotalaNov 4, 2023, 7:02 PM

58 points

44 comments18 min readLW link

AI Alignment Research Engineer Accelerator (ARENA): call for applicants

CallumMcDougallNov 7, 2023, 9:43 AM

56 points

0 comments LW link

It’s OK to be biased towards humans

dr_sNov 11, 2023, 11:59 AM

54 points

69 comments6 min readLW link

Theories of Change for AI Auditing

Lee Sharkey, beren and Marius Hobbhahn

Nov 13, 2023, 7:33 PM

54 points

0 comments18 min readLW link

(www.apolloresearch.ai)

They are made of repeating patterns

quetzal_rainbowNov 13, 2023, 6:17 PM

53 points

4 comments2 min readLW link

AMA: Earning to Give

jefftkNov 7, 2023, 4:20 PM

53 points

8 comments1 min readLW link

(www.jefftk.com)

Zvi’s Manifold Markets House Rules

ZviNov 13, 2023, 12:28 AM

53 points

6 comments3 min readLW link

Open Phil releases RFPs on LLM Benchmarks and Forecasting

LawrenceCNov 11, 2023, 3:01 AM

53 points

0 comments2 min readLW link

(www.openphilanthropy.org)

AI #37: Moving Too Fast

ZviNov 9, 2023, 5:50 PM

53 points

5 comments76 min readLW link

(thezvi.wordpress.com)

OpenAI Staff (including Sutskever) Threaten to Quit Unless Board Resigns

Seth HerdNov 20, 2023, 2:20 PM

52 points

28 comments1 min readLW link

(www.wired.com)

The Stochastic Parrot Hypothesis is debatable for the last generation of LLMs

Quentin FEUILLADE--MONTIXI and Pierre Peigné

Nov 7, 2023, 4:12 PM

52 points

21 comments6 min readLW link

In Defense of Parselmouths

ScrewtapeNov 15, 2023, 11:02 PM

51 points

11 comments10 min readLW link 1 review

Polysemantic Attention Head in a 4-Layer Transformer

Jett Janiak, cmathw and StefanHex

Nov 9, 2023, 4:16 PM

51 points

0 comments6 min readLW link

On Tapping Out

ScrewtapeNov 17, 2023, 3:23 AM

51 points

14 comments8 min readLW link 1 review

The Assumed Intent Bias

silentbobNov 5, 2023, 4:28 PM

51 points

13 comments6 min readLW link

Altman firing retaliation incoming?

trevorNov 19, 2023, 12:10 AM

50 points

23 comments5 min readLW link

Apply to the Conceptual Boundaries Workshop for AI Safety

ChipmonkNov 27, 2023, 9:04 PM

50 points

0 comments3 min readLW link

On Overhangs and Technological Change

RokoNov 5, 2023, 10:58 PM

50 points

19 comments2 min readLW link

GPT-2030 and Catastrophic Drives: Four Vignettes

jsteinhardtNov 10, 2023, 7:30 AM

50 points

5 comments10 min readLW link

(bounded-regret.ghost.io)

Job listing: Communications Generalist / Project Manager

Gretta DulebaNov 6, 2023, 8:21 PM

49 points

7 comments1 min readLW link

Tall Tales at Different Scales: Evaluating Scaling Trends For Deception In Language Models

Felix Hofstätter, Francis Rhys Ward, HarrietW, LAThomson, Ollie J, Patrik Bartak and Sam F. Brown

Nov 8, 2023, 11:37 AM

49 points

0 comments18 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer