All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Timaeus is hiring!

Jesse Hoogland, Stan van Wingerden, Alexander Gietelink Oldenziel and Daniel Murfet

Jul 12, 2024, 11:42 PM

67 points

6 comments2 min readLW link

Friendship is transactional, unconditional friendship is insurance

RubyJul 17, 2024, 10:52 PM

67 points

24 comments2 min readLW link

SAEs (usually) Transfer Between Base and Chat Models

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Jul 18, 2024, 10:29 AM

67 points

0 comments10 min readLW link

Open Source Automated Interpretability for Sparse Autoencoder Features

kh4dien, SrGonao, jacob_drori and Nora Belrose

Jul 30, 2024, 9:11 PM

67 points

1 comment13 min readLW link

(blog.eleuther.ai)

Advice to junior AI governance researchers

Orpheus16Jul 8, 2024, 7:19 PM

66 points

1 comment5 min readLW link

Static Analysis As A Lifestyle

adamShimiJul 3, 2024, 6:29 PM

65 points

11 comments3 min readLW link

(epistemologicalfascinations.substack.com)

[Interim research report] Activation plateaus & sensitive directions in GPT2

StefanHex and jake_mendel

Jul 5, 2024, 5:05 PM

65 points

2 comments5 min readLW link

A “Bitter Lesson” Approach to Aligning AGI and ASI

RogerDearnaleyJul 6, 2024, 1:23 AM

63 points

41 comments24 min readLW link

Ice: The Penultimate Frontier

RokoJul 13, 2024, 11:44 PM

63 points

56 comments1 min readLW link

(transhumanaxiology.substack.com)

RTFB: California’s AB 3211

ZviJul 30, 2024, 1:10 PM

62 points

2 comments11 min readLW link

(thezvi.wordpress.com)

Consider the humble rock (or: why the dumb thing kills you)

pleiotrothJul 4, 2024, 1:54 PM

62 points

11 comments4 min readLW link

Linkpost: Surely you can be serious

kaveJul 18, 2024, 10:18 PM

62 points

8 comments1 min readLW link

(www.experimental-history.com)

A framework for thinking about AI power-seeking

Joe CarlsmithJul 24, 2024, 10:41 PM

62 points

15 comments16 min readLW link

BatchTopK: A Simple Improvement for TopK-SAEs

Bart Bussmann, Patrick Leask and Neel Nanda

Jul 20, 2024, 2:20 AM

61 points

0 comments4 min readLW link

Inspired by: Failures in Kindness

X4vierJul 27, 2024, 1:21 AM

60 points

2 comments3 min readLW link

Feature Targeted LLC Estimation Distinguishes SAE Features from Random Directions

Lidor Banuel Dabbah and Aviel Boag

Jul 19, 2024, 8:32 PM

59 points

6 comments16 min readLW link

Towards shutdownable agents via stochastic choice

EJT, alexr, christosi and LAThomson

Jul 8, 2024, 10:14 AM

59 points

11 comments23 min readLW link

(arxiv.org)

Pacing Outside the Box: RNNs Learn to Plan in Sokoban

Adrià Garriga-alonso, taufeeque, AdamGleave and ChengCheng

Jul 25, 2024, 10:00 PM

59 points

8 comments2 min readLW link

(arxiv.org)

[EAForum xpost] A breakdown of OpenAI’s revenue

dschwarz and Lawrence Phillips

Jul 10, 2024, 6:09 PM

57 points

5 comments1 min readLW link

(forum.effectivealtruism.org)

AI Alignment Research Engineer Accelerator (ARENA): Call for applicants v4.0

James Fox, Chloe Li, JamesH, Gracie Green and CallumMcDougall

Jul 6, 2024, 11:34 AM

57 points

7 comments6 min readLW link

Coalitional agency

Richard_NgoJul 22, 2024, 12:09 AM

56 points

6 comments6 min readLW link

Unlocking Solutions—By Understanding Coordination Problems

James Stephen BrownJul 27, 2024, 4:52 AM

56 points

4 comments5 min readLW link

(nonzerosum.games)

How the AI safety technical landscape has changed in the last year, according to some practitioners

tlevinJul 26, 2024, 7:06 PM

55 points

6 comments2 min readLW link

Unlearning via RMU is mostly shallow

Andy Arditi and bilalchughtai

Jul 23, 2024, 4:07 PM

54 points

4 comments6 min readLW link

Causal Graphs of GPT-2-Small’s Residual Stream

David UdellJul 9, 2024, 10:06 PM

53 points

7 comments7 min readLW link

Breaking Circuit Breakers

mikes and tbenthompson

Jul 14, 2024, 6:57 PM

53 points

13 comments1 min readLW link

(confirmlabs.org)

AI #71: Farewell to Chevron

ZviJul 4, 2024, 1:40 PM

53 points

9 comments36 min readLW link

(thezvi.wordpress.com)

Sherlockian Abduction Master List

Cole WyethJul 11, 2024, 8:27 PM

52 points

66 comments36 min readLW link

Llama Llama-3-405B?

ZviJul 24, 2024, 7:40 PM

51 points

9 comments30 min readLW link

(thezvi.wordpress.com)

Consent across power differentials

Ramana KumarJul 9, 2024, 11:42 AM

50 points

12 comments3 min readLW link

DM Parenting

Shoshannah TekofskyJul 16, 2024, 8:50 AM

50 points

4 comments5 min readLW link

(kidquest.substack.com)

How do we know that “good research” is good? (aka “direct evaluation” vs “eigen-evaluation”)

RubyJul 19, 2024, 12:31 AM

49 points

21 comments6 min readLW link

JumpReLU SAEs + Early Access to Gemma 2 SAEs

Senthooran Rajamanoharan, Tom Lieberum, nps29, Arthur Conmy, Vikrant Varma, János Kramár and Neel Nanda

Jul 19, 2024, 4:10 PM

49 points

10 comments1 min readLW link

(storage.googleapis.com)

On scalable oversight with weak LLMs judging strong LLMs

zac_kenton, Noah Siegel, janos, Jonah Brown-Cohen, Samuel Albanie, David Lindner and Rohin Shah

Jul 8, 2024, 8:59 AM

49 points

18 comments7 min readLW link

(arxiv.org)

Why the Best Writers Endure Isolation

Declan MolonyJul 16, 2024, 5:58 AM

49 points

6 comments2 min readLW link

Misnaming and Other Issues with OpenAI’s “Human Level” Superintelligence Hierarchy

DavidmanheimJul 15, 2024, 5:50 AM

49 points

2 comments3 min readLW link

Caring about excellence

owencbJul 22, 2024, 2:24 PM

47 points

4 comments LW link

Robin Hanson AI X-Risk Debate — Highlights and Analysis

LironJul 12, 2024, 9:31 PM

46 points

7 comments45 min readLW link

(www.youtube.com)

Games for AI Control

charlie_griffin and Buck

Jul 11, 2024, 6:40 PM

45 points

0 comments5 min readLW link

AI #72: Denying the Future

ZviJul 11, 2024, 3:00 PM

45 points

8 comments41 min readLW link

(thezvi.wordpress.com)

We ran an AI safety conference in Tokyo. It went really well. Come next year!

BlaineJul 17, 2024, 6:55 AM

45 points

1 comment6 min readLW link

Why Georgism Lost Its Popularity

Zero ContradictionsJul 20, 2024, 3:08 PM

45 points

54 comments1 min readLW link

(zerocontradictions.net)

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. HudsonJul 16, 2024, 10:44 PM

44 points

27 comments5 min readLW link

Open Sourcing Metaculus

ChristianWilliamsJul 2, 2024, 10:30 PM

44 points

0 comments LW link

(www.metaculus.com)

Trust as a bottleneck to growing teams quickly

benkuhnJul 13, 2024, 6:00 PM

44 points

3 comments5 min readLW link

(www.benkuhn.net)

New Executive Team & Board — PIBBSS

Nora_AmmannJul 1, 2024, 7:30 PM

43 points

1 comment1 min readLW link

Understanding Positional Features in Layer 0 SAEs

bilalchughtai and Yeu-Tong Lau

Jul 29, 2024, 9:36 AM

43 points

0 comments5 min readLW link

List of Collective Intelligence Projects

ChipmonkJul 2, 2024, 2:10 PM

42 points

9 comments2 min readLW link

(chrislakin.blog)

Paper Summary: The Effects of Communicating Uncertainty on Public Trust in Facts and Numbers

Jeffrey HeningerJul 9, 2024, 4:50 PM

42 points

2 comments2 min readLW link

(blog.aiimpacts.org)

(Approximately) Deterministic Natural Latents

johnswentworth and David Lorell

Jul 19, 2024, 11:02 PM

42 points

1 comment4 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer