All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov

All12 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Meta-Technicalities: Safeguarding Values in Formal Systems

LTMApr 30, 2025, 11:43 PM

2 points

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

(routecause.substack.com)

Obstacles in ARC’s agenda: Finding explanations

David MatolcsiApr 30, 2025, 11:03 PM

123 points

37 votes

Overall karma indicates overall quality.

10 comments17 min readLW link

GPT-4o Responds to Negative Feedback

ZviApr 30, 2025, 8:20 PM

45 points

17 votes

Overall karma indicates overall quality.

2 comments18 min readLW link

(thezvi.wordpress.com)

State of play of AI progress (and related brakes on an intelligence explosion) [Linkpost]

Noosphere89Apr 30, 2025, 7:58 PM

7 points

1 vote

Overall karma indicates overall quality.

0 comments5 min readLW link

(www.interconnects.ai)

Don’t accuse your interlocutor of being insufficiently truth-seeking

TFDApr 30, 2025, 7:38 PM

30 points

13 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

(www.thefloatingdroid.com)

How can we solve diffuse threats like research sabotage with AI control?

Vivek HebbarApr 30, 2025, 7:23 PM

52 points

15 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

[Question] Can Narrowing One’s Reference Class Undermine the Doomsday Argument?

Iannoose n.Apr 30, 2025, 6:24 PM

2 points

2 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

[Question] Does there exist an interactive reasoning map tool that lets users visually lay out claims, assign probabilities and confidence levels, and dynamically adjust their beliefs based on weighted influences between connected assertions?

Zack FriedmanApr 30, 2025, 6:22 PM

5 points

4 votes

Overall karma indicates overall quality.

4 comments1 min readLW link

Distilling the Internal Model Principle part II

JoseFaustinoApr 30, 2025, 5:56 PM

15 points

11 votes

Overall karma indicates overall quality.

0 comments19 min readLW link

Research Priorities for Hardware-Enabled Mechanisms (HEMs)

aogApr 30, 2025, 5:43 PM

17 points

3 votes

Overall karma indicates overall quality.

2 comments15 min readLW link

(www.longview.org)

Video and transcript of talk on automating alignment research

Joe CarlsmithApr 30, 2025, 5:43 PM

27 points

5 votes

Overall karma indicates overall quality.

0 comments24 min readLW link

(joecarlsmith.com)

Can we safely automate alignment research?

Joe CarlsmithApr 30, 2025, 5:37 PM

52 points

17 votes

Overall karma indicates overall quality.

29 comments48 min readLW link

(joecarlsmith.com)

Investigating task-specific prompts and sparse autoencoders for activation monitoring

Henk TillmanApr 30, 2025, 5:09 PM

23 points

7 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(arxiv.org)

European Links (30.04.25)

Martin SustrikApr 30, 2025, 3:40 PM

15 points

6 votes

Overall karma indicates overall quality.

1 comment8 min readLW link

(250bpm.substack.com)

Scaling Laws for Scalable Oversight

Subhash Kantamneni, Josh Engels, David Baek and Max Tegmark

Apr 30, 2025, 12:13 PM

37 points

17 votes

Overall karma indicates overall quality.

1 comment9 min readLW link

Early Chinese Language Media Coverage of the AI 2027 Report: A Qualitative Analysis

jeanne_ and eeeee

Apr 30, 2025, 11:06 AM

216 points

97 votes

Overall karma indicates overall quality.

11 comments11 min readLW link

[Paper] Automated Feature Labeling with Token-Space Gradient Descent

Wuschel SchulzApr 30, 2025, 10:22 AM

4 points

1 vote

Overall karma indicates overall quality.

0 comments4 min readLW link

A single principle related to many Alignment subproblems?

Q HomeApr 30, 2025, 9:49 AM

43 points

15 votes

Overall karma indicates overall quality.

34 comments17 min readLW link

What if Brain Computer Interfaces went exponential?

Stephen MartinApr 30, 2025, 5:07 AM

−1 points

3 votes

Overall karma indicates overall quality.

0 comments12 min readLW link

Interpreting the METR Time Horizons Post

snewmanApr 30, 2025, 3:03 AM

70 points

18 votes

Overall karma indicates overall quality.

12 comments10 min readLW link

(amistrongeryet.substack.com)

Should we expect the future to be good?

Neil CrawfordApr 30, 2025, 12:36 AM

15 points

7 votes

Overall karma indicates overall quality.

0 comments14 min readLW link

Judging types of consequentialism by influence and normativity

Cole WyethApr 29, 2025, 11:25 PM

19 points

3 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Bandwidth Rules Everything Around Me: Oliver Habryka on OpenPhil and GoodVentures

ElizabethApr 29, 2025, 8:40 PM

79 points

25 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

(acesounderglass.com)

The Grand Encyclopedia of Eponymous Laws

rogersbaconApr 29, 2025, 7:30 PM

27 points

15 votes

Overall karma indicates overall quality.

6 comments16 min readLW link

(www.secretorum.life)

Misrepresentation as a Barrier for Interp (Part I)

johnswentworth and Steve Petersen

Apr 29, 2025, 5:07 PM

113 points

35 votes

Overall karma indicates overall quality.

12 comments7 min readLW link

AISN #53: An Open Letter Attempts to Block OpenAI Restructuring

Corin Katzke and Dan H

Apr 29, 2025, 4:13 PM

7 points

3 votes

Overall karma indicates overall quality.

0 comments4 min readLW link

What could Alphafold 4 look like?

Abhishaike MahajanApr 29, 2025, 3:45 PM

8 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Sealed Computation: Towards Low-Friction Proof of Locality

Paul BricmanApr 29, 2025, 3:26 PM

4 points

2 votes

Overall karma indicates overall quality.

0 comments10 min readLW link

(noemaresearch.com)

Dating Roundup #4: An App for That

ZviApr 29, 2025, 1:10 PM

18 points

12 votes

Overall karma indicates overall quality.

5 comments16 min readLW link

(thezvi.wordpress.com)

Talk on letters to AI (London)

ukc10014Apr 29, 2025, 9:50 AM

3 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

Memory Decoding Journal Club: “Motor learning selectively strengthens cortical and striatal synapses of motor engram neurons”

Devin WardApr 29, 2025, 2:26 AM

1 point

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

D&D.Sci Tax Day: Adventurers and Assessments Evaluation & Ruleset

aphyerApr 29, 2025, 2:00 AM

28 points

7 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

How to Build a Third Place on Focusmate

Parker ConleyApr 28, 2025, 11:46 PM

97 points

49 votes

Overall karma indicates overall quality.

10 comments5 min readLW link

(parconley.com)

Methods of defense against AGI manipulation

MarkelKoriApr 28, 2025, 9:03 PM

3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

China’s Petition System: It Looks Like Democracy — But It Isn’t

Hu YichaoApr 28, 2025, 8:56 PM

0 points

13 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Fundamentals of Safe AI (Phase 1) – Applications Open for the Global Cohort

rajsecretsApr 28, 2025, 8:52 PM

9 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Proceedings of ILIAD: Lessons and Progress

Alexander Gietelink Oldenziel and JessRiedel

Apr 28, 2025, 7:04 PM

78 points

25 votes

Overall karma indicates overall quality.

5 comments8 min readLW link

GPT-4o Is An Absurd Sycophant

ZviApr 28, 2025, 7:00 PM

81 points

41 votes

Overall karma indicates overall quality.

7 comments19 min readLW link

(thezvi.wordpress.com)

[Question] What are the best standardised, repeatable bets?

kaveApr 28, 2025, 6:45 PM

31 points

11 votes

Overall karma indicates overall quality.

10 comments1 min readLW link

7+ tractable directions in AI control

Julian Stastny and ryan_greenblatt

Apr 28, 2025, 5:12 PM

93 points

37 votes

Overall karma indicates overall quality.

1 comment13 min readLW link

“A victory for the natural order”

Mati_RoyApr 28, 2025, 3:33 PM

11 points

4 votes

Overall karma indicates overall quality.

3 comments1 min readLW link

(preservinghope.substack.com)

Why giving workers stocks isn’t enough — and what co-ops get right

B JacobsApr 28, 2025, 2:19 PM

7 points

7 votes

Overall karma indicates overall quality.

9 comments2 min readLW link

(bobjacobs.substack.com)

Keltham on Becoming more Truth-Oriented

Towards_KeeperhoodApr 28, 2025, 12:58 PM

22 points

12 votes

Overall karma indicates overall quality.

2 comments19 min readLW link

Therapist in the Weights: Risks of Hyper-Introspection in Future AI Systems

DavidmanheimApr 28, 2025, 6:42 AM

15 points

2 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

In Darkness They Assembled

Charlie SandersApr 28, 2025, 3:44 AM

2 points

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Seeking advice on careers in AI Safety

nemApr 27, 2025, 11:59 PM

8 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Thin Alignment Can’t Solve Thick Problems

Daan HenselmansApr 27, 2025, 10:42 PM

11 points

9 votes

Overall karma indicates overall quality.

2 comments9 min readLW link

The Way You Go Depends A Good Deal On Where You Want To Get: FEP minimizes surprise about actions using preferences about the future as evidence

Christopher KingApr 27, 2025, 9:55 PM

10 points

7 votes

Overall karma indicates overall quality.

5 comments5 min readLW link

How people use LLMs

ElizabethApr 27, 2025, 9:48 PM

83 points

25 votes

Overall karma indicates overall quality.

6 comments1 min readLW link

(www.gleech.org)

Луна Лавгуд и Комната Тайн, Часть 6

Kongo Landwalker and lsusr

Apr 27, 2025, 8:26 PM

3 points

2 votes

Overall karma indicates overall quality.

0 comments2 min readLW link

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer