All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Exercise: Planmaking, Surprise Anticipation, and “Baba is You”

RaemonFeb 24, 2024, 8:33 PM

67 points

31 comments6 min readLW link

Most experts believe COVID-19 was probably not a lab leak

DanielFilanFeb 2, 2024, 7:28 PM

66 points

89 comments2 min readLW link

(gcrinstitute.org)

Self-Awareness: Taxonomy and eval suite proposal

Daniel KokotajloFeb 17, 2024, 1:47 AM

65 points

2 comments11 min readLW link

On the Debate Between Jezos and Leahy

ZviFeb 6, 2024, 2:40 PM

64 points

6 comments63 min readLW link

(thezvi.wordpress.com)

Managing risks while trying to do good

Wei DaiFeb 1, 2024, 6:08 PM

63 points

26 comments LW link

On coincidences and Bayesian reasoning, as applied to the origins of COVID-19

viking_mathFeb 19, 2024, 1:14 AM

62 points

28 comments14 min readLW link

Balancing Games

jefftkFeb 24, 2024, 2:40 PM

62 points

18 comments1 min readLW link

(www.jefftk.com)

Offering AI safety support calls for ML professionals

Vael GatesFeb 15, 2024, 11:48 PM

61 points

1 comment LW link

[Question] What’s the theory of impact for activation vectors?

Chris_LeongFeb 11, 2024, 7:34 AM

61 points

12 comments1 min readLW link

Noticing Panic

Cole WyethFeb 5, 2024, 3:45 AM

59 points

8 comments3 min readLW link

Acting Wholesomely

owencbFeb 26, 2024, 9:49 PM

59 points

64 comments LW link

The Sense Of Physical Necessity: A Naturalism Demo (Introduction)

LoganStrohlFeb 24, 2024, 2:56 AM

59 points

1 comment6 min readLW link

Voting Results for the 2022 Review

Ben PaceFeb 2, 2024, 8:34 PM

57 points

3 comments73 min readLW link

Dual Wielding Kindle Scribes

mesaoptimizerFeb 21, 2024, 5:17 PM

57 points

18 comments6 min readLW link

Skepticism About DeepMind’s “Grandmaster-Level” Chess Without Search

Arjun PanicksseryFeb 12, 2024, 12:56 AM

57 points

13 comments3 min readLW link

Evaluating Stability of Unreflective Alignment

james.lucassenFeb 1, 2024, 10:15 PM

57 points

12 comments18 min readLW link

(jlucassen.com)

Phallocentricity in GPT-J’s bizarre stratified ontology

mwatkinsFeb 17, 2024, 12:16 AM

56 points

37 comments9 min readLW link

Conditional prediction markets are evidential, not causal

philhFeb 7, 2024, 9:52 PM

55 points

10 comments2 min readLW link

How do you actually obtain and report a likelihood function for scientific research?

Peter BerggrenFeb 11, 2024, 5:42 PM

55 points

4 comments1 min readLW link

Cooperating with aliens and AGIs: An ECL explainer

Chi Nguyen, _will_ and Orpheus16

Feb 24, 2024, 10:58 PM

55 points

8 comments LW link

Why I no longer identify as transhumanist

Kaj_SotalaFeb 3, 2024, 12:00 PM

55 points

33 comments3 min readLW link

(kajsotala.fi)

Safe Stasis Fallacy

DavidmanheimFeb 5, 2024, 10:54 AM

54 points

2 comments LW link

The Shutdown Problem: Incomplete Preferences as a Solution

EJTFeb 23, 2024, 4:01 PM

53 points

33 comments42 min readLW link

AI #50: The Most Dangerous Thing

ZviFeb 8, 2024, 2:30 PM

53 points

4 comments24 min readLW link

(thezvi.wordpress.com)

[Question] Can we get an AI to “do our alignment homework for us”?

Chris_LeongFeb 26, 2024, 7:56 AM

53 points

33 comments1 min readLW link

Complexity of value but not disvalue implies more focus on s-risk. Moral uncertainty and preference utilitarianism also do.

Chi NguyenFeb 23, 2024, 6:10 AM

52 points

18 comments LW link

Toy models of AI control for concentrated catastrophe prevention

Fabien Roger and Buck

Feb 6, 2024, 1:38 AM

51 points

2 comments7 min readLW link

AI #52: Oops

ZviFeb 22, 2024, 9:50 PM

50 points

9 comments29 min readLW link

(thezvi.wordpress.com)

Transfer learning and generalization-qua-capability in Babbage and Davinci (or, why division is better than Spanish)

RP and agg

Feb 9, 2024, 7:00 AM

50 points

6 comments3 min readLW link

Notes on control evaluations for safety cases

ryan_greenblatt, Buck and Fabien Roger

Feb 28, 2024, 4:15 PM

49 points

0 comments32 min readLW link

Critiques of the AI control agenda

JozdienFeb 14, 2024, 7:25 PM

48 points

14 comments9 min readLW link

Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities

porbyFeb 2, 2024, 5:49 AM

47 points

1 comment4 min readLW link

(arxiv.org)

What does davidad want from «boundaries»?

Chris Lakin and davidad

Feb 6, 2024, 5:45 PM

47 points

1 comment5 min readLW link

I’d also take $7 trillion

bhauthFeb 19, 2024, 3:31 AM

47 points

12 comments10 min readLW link

(www.bhauth.com)

Value learning in the absence of ground truth

Joel_SaarinenFeb 5, 2024, 6:56 PM

47 points

8 comments45 min readLW link

Sora What

ZviFeb 22, 2024, 6:10 PM

47 points

3 comments9 min readLW link

(thezvi.wordpress.com)

Fluent dreaming for language models (AI interpretability method)

tbenthompson, mikes and Zygi Straznickas

Feb 6, 2024, 6:02 AM

46 points

5 comments1 min readLW link

(arxiv.org)

On the Proposed California SB 1047

ZviFeb 12, 2024, 4:40 PM

46 points

18 comments12 min readLW link

(thezvi.wordpress.com)

Thoughts on “The Offense-Defense Balance Rarely Changes”

CullenFeb 12, 2024, 3:26 AM

46 points

4 comments LW link

[Question] Where is the Town Square?

Gretta DulebaFeb 13, 2024, 3:53 AM

46 points

8 comments1 min readLW link

The Gemini Incident Continues

ZviFeb 27, 2024, 4:00 PM

45 points

6 comments48 min readLW link

(thezvi.wordpress.com)

A starting point for making sense of task structure (in machine learning)

Kaarel, RP and jake_mendel

Feb 24, 2024, 1:51 AM

45 points

2 comments12 min readLW link

Why does generalization work?

Martín SotoFeb 20, 2024, 5:51 PM

43 points

16 comments4 min readLW link

Job Listing: Managing Editor / Writer

Gretta DulebaFeb 21, 2024, 11:41 PM

43 points

2 comments1 min readLW link

Examining Language Model Performance with Reconstructed Activations using Sparse Autoencoders

Evan Anders and Joseph Bloom

Feb 27, 2024, 2:43 AM

43 points

16 comments15 min readLW link

Protocol evaluations: good analogies vs control

Fabien RogerFeb 19, 2024, 6:00 PM

42 points

10 comments11 min readLW link

Evidential Cooperation in Large Worlds: Potential Objections & FAQ

Chi Nguyen and _will_

Feb 28, 2024, 6:58 PM

42 points

5 comments LW link

Deep and obvious points in the gap between your thoughts and your pictures of thought

KatjaGraceFeb 23, 2024, 7:30 AM

42 points

6 comments1 min readLW link

(worldspiritsockpuppet.com)

How I internalized my achievements to better deal with negative feelings

Raymond KoopmanschapFeb 27, 2024, 3:10 PM

42 points

7 comments6 min readLW link

Wholesomeness and Effective Altruism

owencbFeb 28, 2024, 8:28 PM

42 points

3 comments LW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer