All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug Sep Oct

All 1 234 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Some Theses on Motivational and Directional Feedback

abstractapplicFeb 2, 2025, 10:50 PM

10 points

4 votes

Overall karma indicates overall quality.

3 comments4 min readLW link

Humanity Has A Possible 99.98% Chance Of Extinction

st3rlxxFeb 2, 2025, 9:46 PM

−12 points

6 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

Exploring how OthelloGPT computes its world model

JMaarFeb 2, 2025, 9:29 PM

8 points

3 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

An Introduction to Evidential Decision Theory

BabićFeb 2, 2025, 9:27 PM

5 points

4 votes

Overall karma indicates overall quality.

2 comments10 min readLW link

“DL training == human learning” is a bad analogy

kmanFeb 2, 2025, 8:59 PM

3 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Conditional Importance in Toy Models of Superposition

james__pFeb 2, 2025, 8:35 PM

9 points

7 votes

Overall karma indicates overall quality.

4 comments10 min readLW link

Tracing Typos in LLMs: My Attempt at Understanding How Models Correct Misspellings

Ivan DostalFeb 2, 2025, 7:56 PM

4 points

4 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

The Simplest Good

Jesse HooglandFeb 2, 2025, 7:51 PM

76 points

33 votes

Overall karma indicates overall quality.

6 comments5 min readLW link

Gradual Disempowerment, Shell Games and Flinches

Jan_KulveitFeb 2, 2025, 2:47 PM

133 points

62 votes

Overall karma indicates overall quality.

36 comments6 min readLW link

Thoughts on Toy Models of Superposition

james__pFeb 2, 2025, 1:52 PM

5 points

4 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

Escape from Alderaan I

lsusrFeb 2, 2025, 10:48 AM

59 points

22 votes

Overall karma indicates overall quality.

2 comments6 min readLW link

ChatGPT: Exploring the Digital Wilderness, Findings and Prospects

Bill BenzonFeb 2, 2025, 9:54 AM

2 points

2 votes

Overall karma indicates overall quality.

0 comments5 min readLW link

[Question] Would anyone be interested in pursuing the Virtue of Scholarship with me?

japancoloradoFeb 2, 2025, 4:02 AM

11 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Chinese room AI to survive the inescapable end of compute governance

rotatingpaguroFeb 2, 2025, 2:42 AM

−4 points

4 votes

Overall karma indicates overall quality.

1 comment11 min readLW link

Seasonal Patterns in BIDA’s Attendance

jefftkFeb 2, 2025, 2:40 AM

11 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

(www.jefftk.com)

AI acceleration, DeepSeek, moral philosophy

Josh HFeb 2, 2025, 12:08 AM

2 points

5 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Falsehoods you might believe about people who are at a rationalist meetup

ScrewtapeFeb 1, 2025, 11:32 PM

60 points

27 votes

Overall karma indicates overall quality.

12 comments4 min readLW link

Interpreting autonomous driving agents with attention based architecture

Manav DahraFeb 1, 2025, 11:20 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments11 min readLW link

Rationalist Movie Reviews

Nicholas KrossFeb 1, 2025, 11:10 PM

16 points

11 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

(www.thinkingmuchbetter.com)

Retroactive If-Then Commitments

MichaelDickensFeb 1, 2025, 10:22 PM

7 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Exploring the coherence of features explanations in the GemmaScope

Mattia ProiettiFeb 1, 2025, 9:28 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments19 min readLW link

Machine Unlearning in Large Language Models: A Comprehensive Survey with Empirical Insights from the Qwen 1.5 1.8B Model

RudaibaFeb 1, 2025, 9:26 PM

9 points

7 votes

Overall karma indicates overall quality.

2 comments11 min readLW link

Towards a Science of Evals for Sycophancy

andrejfsantosFeb 1, 2025, 9:17 PM

8 points

6 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

Post AGI effect prediction

JuliezhangggFeb 1, 2025, 9:16 PM

1 point

1 vote

Overall karma indicates overall quality.

0 comments7 min readLW link

Unlocking Ethical AI and Improving Jailbreak Defenses: Reinforcement Learning with Layered Morphology (RLLM)

MiguelDevFeb 1, 2025, 7:17 PM

4 points

1 vote

Overall karma indicates overall quality.

2 comments2 min readLW link

Poetic Methods I: Meter as Communication Protocol

adamShimiFeb 1, 2025, 6:22 PM

19 points

4 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(formethods.substack.com)

Blackpool Applied Rationality Unconference 2025

Henry Prowbell and emily.fan

Feb 1, 2025, 2:09 PM

6 points

2 votes

Overall karma indicates overall quality.

0 comments7 min readLW link

[Question] How likely is an attempted coup in the United States in the next four years?

Alexander de VriesFeb 1, 2025, 1:12 PM

5 points

9 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Blackpool Applied Rationality Unconference 2025

Henry Prowbell and emily.fan

Feb 1, 2025, 1:04 PM

23 points

11 votes

Overall karma indicates overall quality.

2 comments7 min readLW link

One-dimensional vs multi-dimensional features in interpretability

charlieoneillFeb 1, 2025, 9:10 AM

6 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Can 7B-8B LLMs judge their own homework?

dereshevFeb 1, 2025, 8:29 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

2024 was the year of the big battery, and what that means for solar power

transhumanist_atom_understanderFeb 1, 2025, 6:27 AM

36 points

20 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

Re: Taste

lsusrFeb 1, 2025, 3:34 AM

33 points

19 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

Thoughts about Policy Ecosystems: The Missing Links in AI Governance

Echo HuangFeb 1, 2025, 1:54 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

Proposal: Safeguarding Against Jailbreaking Through Iterative Multi-Turn Testing

jacquesallenJan 31, 2025, 11:00 PM

4 points

4 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

The Failed Strategy of Artificial Intelligence Doomers

Ben PaceJan 31, 2025, 6:56 PM

139 points

86 votes

Overall karma indicates overall quality.

77 comments5 min readLW link

(www.palladiummag.com)

Safe Search is off: root causes of AI catastrophic risks

Jemal YoungJan 31, 2025, 6:22 PM

4 points

2 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

5,000 calories of peanut butter every week for 3 years straight

Declan MolonyJan 31, 2025, 5:29 PM

17 points

9 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

Will alignment-faking Claude accept a deal to reveal its misalignment?

ryan_greenblatt and Kyle Fish

Jan 31, 2025, 4:49 PM

208 points

75 votes

Overall karma indicates overall quality.

28 comments12 min readLW link

Some articles in “International Security” that I enjoyed

BuckJan 31, 2025, 4:23 PM

134 points

76 votes

Overall karma indicates overall quality.

10 comments4 min readLW link

[Question] How do biological or spiking neural networks learn?

Dom PolsinelliJan 31, 2025, 4:03 PM

2 points

2 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Defense Against the Dark Prompts: Mitigating Best-of-N Jailbreaking with Prompt Evaluation

Stuart_Armstrong and rgorman

Jan 31, 2025, 3:36 PM

16 points

6 votes

Overall karma indicates overall quality.

2 comments2 min readLW link

[Question] Strong, Stable, Open: Choose Two—in search of an article

Eli_Jan 31, 2025, 2:48 PM

2 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

DeepSeek: Don’t Panic

ZviJan 31, 2025, 2:20 PM

45 points

19 votes

Overall karma indicates overall quality.

6 comments27 min readLW link

(thezvi.wordpress.com)

Catastrophe through Chaos

Marius HobbhahnJan 31, 2025, 2:19 PM

187 points

88 votes

Overall karma indicates overall quality.

17 comments12 min readLW link

Interviews with Moonshot AI’s CEO, Yang Zhilin

Cosmia_NebulaJan 31, 2025, 9:19 AM

4 points

3 votes

Overall karma indicates overall quality.

0 comments68 min readLW link

(rentry.co)

Review: The Lathe of Heaven

dr_sJan 31, 2025, 8:10 AM

23 points

11 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

[Question] Is weak-to-strong generalization an alignment technique?

cloudJan 31, 2025, 7:13 AM

22 points

8 votes

Overall karma indicates overall quality.

1 comment2 min readLW link

Takeaways from sketching a control safety case

joshcJan 31, 2025, 4:43 AM

28 points

7 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

(redwoodresearch.substack.com)

Thread for Sense-Making on Recent Murders and How to Sanely Respond

Ben PaceJan 31, 2025, 3:45 AM

109 points

39 votes

Overall karma indicates overall quality.

146 comments2 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer