All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

The Case for Predictive Models

Rubi J. HudsonApr 3, 2024, 6:22 PM

43 points

7 comments8 min readLW link

Concrete empirical research projects in mechanistic anomaly detection

Erik Jenner, Viktor Rehnberg and Oliver Daniels

Apr 3, 2024, 11:07 PM

43 points

3 comments10 min readLW link

List your AI X-Risk cruxes!

Aryeh EnglanderApr 28, 2024, 6:26 PM

42 points

7 comments2 min readLW link

Forget Everything (Statistical Mechanics Part 1)

J BostockApr 22, 2024, 1:33 PM

42 points

7 comments3 min readLW link

Gated Attention Blocks: Preliminary Progress toward Removing Attention Head Superposition

cmathw, Dennis Akar and Lee Sharkey

Apr 8, 2024, 11:14 AM

42 points

4 comments15 min readLW link

Notes on Dwarkesh Patel’s Podcast with Sholto Douglas and Trenton Bricken

ZviApr 1, 2024, 7:10 PM

41 points

1 comment16 min readLW link

(thezvi.wordpress.com)

Scaling of AI training runs will slow down after GPT-5

Maxime RichéApr 26, 2024, 4:05 PM

40 points

5 comments3 min readLW link

Conflict in Posthuman Literature

Martín SotoApr 6, 2024, 10:26 PM

40 points

1 comment2 min readLW link

(twitter.com)

What’s up with all the non-Mormons? Weirdly specific universalities across LLMs

mwatkinsApr 19, 2024, 1:43 PM

40 points

13 comments27 min readLW link

Dequantifying first-order theories

jessicataApr 23, 2024, 7:04 PM

40 points

9 comments8 min readLW link

(unstableontology.com)

AI Regulation is Unsafe

Maxwell TabarrokApr 22, 2024, 4:37 PM

40 points

41 comments4 min readLW link

(www.maximum-progress.com)

Losing Faith In Contrarianism

Bentham's BulldogApr 25, 2024, 8:53 PM

39 points

44 comments5 min readLW link

On what research policymakers actually need

MondSemmelApr 23, 2024, 7:50 PM

38 points

0 comments3 min readLW link

(www.slowboring.com)

Inducing Unprompted Misalignment in LLMs

Sam Svenningsen, evhub and Henry Sleight

Apr 19, 2024, 8:00 PM

38 points

7 comments16 min readLW link

[Fiction] A Confession

Arjun PanicksseryApr 18, 2024, 4:28 PM

38 points

2 comments5 min readLW link

(arjunpanickssery.substack.com)

Tinker

Richard_NgoApr 16, 2024, 6:26 PM

38 points

0 comments1 min readLW link

(press.asimov.com)

Thousands of malicious actors on the future of AI misuse

Zershaaneh Qureshi, Corin Katzke and Convergence Analysis

Apr 1, 2024, 10:08 AM

37 points

0 comments1 min readLW link

Medical Roundup #2

ZviApr 9, 2024, 1:40 PM

37 points

18 comments16 min readLW link

(thezvi.wordpress.com)

Effectively Handling Disagreements—Introducing a New Workshop

Camille Berger Apr 15, 2024, 4:33 PM

37 points

2 comments7 min readLW link

A High Decoupling Failure

Maxwell TabarrokApr 14, 2024, 7:46 PM

37 points

5 comments3 min readLW link

(www.maximum-progress.com)

[Question] Is there software to practice reading expressions?

lsusrApr 23, 2024, 9:53 PM

37 points

11 comments1 min readLW link

WSJ: Inside Amazon’s Secret Operation to Gather Intel on Rivals

trevorApr 23, 2024, 9:33 PM

37 points

5 comments5 min readLW link

(www.wsj.com)

The Evolution of Humans Was Net-Negative for Human Values

Zack_M_DavisApr 1, 2024, 4:01 PM

37 points

1 comment2 min readLW link

Claude 3 Opus can operate as a Turing machine

Gunnar_ZarnckeApr 17, 2024, 8:41 AM

36 points

2 comments1 min readLW link

(twitter.com)

Childhood and Education Roundup #5

ZviApr 17, 2024, 1:00 PM

36 points

3 comments25 min readLW link

(thezvi.wordpress.com)

LessWrong: After Dark, a new side of LessWrong

So8resApr 1, 2024, 10:44 PM

36 points

6 comments1 min readLW link

How I select alignment research projects

Ethan Perez, Henry Sleight and Mikita Balesni

Apr 10, 2024, 4:33 AM

36 points

4 comments24 min readLW link

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

DiffractorApr 18, 2024, 8:39 AM

34 points

2 comments19 min readLW link

hydrogen tube transport

bhauthApr 18, 2024, 10:47 PM

34 points

12 comments5 min readLW link

(www.bhauth.com)

A quick experiment on LMs’ inductive biases in performing search

Alex MallenApr 14, 2024, 3:41 AM

32 points

2 comments4 min readLW link

Protestants Trading Acausally

Martin SustrikApr 1, 2024, 2:46 PM

31 points

4 comments1 min readLW link

Falling fertility explanations and Israel

Yair HalberstadtApr 3, 2024, 3:27 AM

31 points

4 comments2 min readLW link

Thoughts on Zero Points

depressurizeApr 23, 2024, 2:22 AM

31 points

1 comment4 min readLW link

(sexandchicago.substack.com)

Good Bings copy, great Bings steal

dr_sApr 21, 2024, 9:52 AM

31 points

6 comments9 min readLW link

Quick evidence review of bulking & cutting

jpApr 4, 2024, 9:43 PM

31 points

5 comments4 min readLW link

UDT1.01: Plannable and Unplanned Observations (3/10)

DiffractorApr 12, 2024, 5:24 AM

31 points

0 comments7 min readLW link

New report: A review of the empirical evidence for existential risk from AI via misaligned power-seeking

Harlan and rosehadshar

Apr 4, 2024, 11:41 PM

31 points

5 comments1 min readLW link

(blog.aiimpacts.org)

Announcing SPAR Summer 2024!

laurenmarie12Apr 16, 2024, 8:30 AM

30 points

2 comments1 min readLW link

AI #59: Model Updates

ZviApr 11, 2024, 2:20 PM

30 points

2 comments63 min readLW link

(thezvi.wordpress.com)

Big-endian is better than little-endian

MenotimApr 29, 2024, 2:30 AM

30 points

17 comments3 min readLW link

The Poker Theory of Poker Night

omarkApr 7, 2024, 9:47 AM

29 points

13 comments9 min readLW link

(www.codeandbugs.com)

End-to-end hacking with language models

tchauvinApr 5, 2024, 3:06 PM

29 points

0 comments8 min readLW link

Experiments with an alternative method to promote sparsity in sparse autoencoders

Eoin FarrellApr 15, 2024, 6:21 PM

29 points

7 comments12 min readLW link

Experience Report—ML4Good AI Safety Bootcamp

Kieron KretschmarApr 11, 2024, 6:03 PM

29 points

0 comments4 min readLW link

Please Understand

samhealyApr 1, 2024, 12:33 PM

28 points

11 comments6 min readLW link

[Question] Is LLM Translation Without Rosetta Stone possible?

cubefoxApr 11, 2024, 12:36 AM

28 points

15 comments1 min readLW link

{Book Summary} The Art of Gathering

Tristan WilliamsApr 16, 2024, 10:48 AM

28 points

0 comments13 min readLW link

Structured Transparency: a framework for addressing use/mis-use trade-offs when sharing information

habrykaApr 11, 2024, 6:35 PM

28 points

0 comments2 min readLW link

(arxiv.org)

Ackshually, many worlds is wrong

tailcalledApr 11, 2024, 8:23 PM

27 points

42 comments4 min readLW link

On the 2nd CWT with Jonathan Haidt

ZviApr 5, 2024, 5:30 PM

27 points

3 comments33 min readLW link

(thezvi.wordpress.com)