All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30 31

The Mask Comes Off: At What Price?

Zvi21 Oct 2024 23:50 UTC

72 points

16 comments8 min readLW link

(thezvi.wordpress.com)

Distinguishing ways AI can be “concentrated”

Matthew Barnett21 Oct 2024 22:21 UTC

34 points

2 comments4 min readLW link

Jailbreaking ChatGPT and Claude using Web API Context Injection

Jaehyuk Lim21 Oct 2024 21:34 UTC

4 points

0 comments3 min readLW link

Pausing for what?

MountainPath21 Oct 2024 20:12 UTC

0 points

1 comment1 min readLW link

What is autonomy? Why boundaries are necessary.

Chris Lakin21 Oct 2024 17:56 UTC

8 points

1 comment1 min readLW link

(chrislakin.blog)

Could randomly choosing people to serve as representatives lead to better government?

John Huang21 Oct 2024 17:10 UTC

77 points

13 comments10 min readLW link

There aren’t enough smart people in biology doing something boring

Abhishaike Mahajan21 Oct 2024 15:52 UTC

28 points

13 comments10 min readLW link

Automation collapse

Geoffrey Irving, Tomek Korbak and Benjamin Hilton

21 Oct 2024 14:50 UTC

72 points

9 comments7 min readLW link

What AI companies should do: Some rough ideas

Zach Stein-Perlman21 Oct 2024 14:00 UTC

33 points

10 comments5 min readLW link

[Question] What should OpenAI do that it hasn’t already done, to stop their vacancies from being advertised on the 80k Job Board?

WitheringWeights21 Oct 2024 13:57 UTC

23 points

0 comments1 min readLW link

A Rocket–Interpretability Analogy

plex21 Oct 2024 13:55 UTC

166 points

33 comments1 min readLW link 1 review

Tokyo AI Safety 2025: Call For Papers

Blaine21 Oct 2024 8:43 UTC

24 points

0 comments3 min readLW link

(www.tais2025.cc)

OpenAI defected, but we can take honest actions

Remmelt21 Oct 2024 8:41 UTC

17 points

16 comments2 min readLW link

Slightly More Than You Wanted To Know: Pregnancy Length Effects

JustisMills21 Oct 2024 1:26 UTC

63 points

4 comments5 min readLW link

(justismills.substack.com)

Information vs Assurance

johnswentworth20 Oct 2024 23:16 UTC

192 points

19 comments2 min readLW link 1 review

Liquid vs Illiquid Careers

vaishnav9220 Oct 2024 23:03 UTC

35 points

7 comments7 min readLW link

(vaishnavsunil.substack.com)

AI Can be “Gradient Aware” Without Doing Gradient hacking.

Sodium20 Oct 2024 21:02 UTC

24 points

1 comment2 min readLW link

A brief theory of why we think things are good or bad

David Johnston20 Oct 2024 20:31 UTC

7 points

10 comments4 min readLW link

Thinking in 2D

sarahconstantin20 Oct 2024 19:30 UTC

27 points

0 comments8 min readLW link

(sarahconstantin.substack.com)

Podcast discussing Hanson’s Cultural Drift Argument

vaishnav92 and regan.arntz.gray

20 Oct 2024 17:58 UTC

3 points

0 comments1 min readLW link

(moralmayhem.substack.com)

Advice on Communicating Concisely

EvolutionByDesign20 Oct 2024 16:45 UTC

3 points

9 comments1 min readLW link

Ambiguities or the issues we face with AI in medicine

Thehumanproject.ai20 Oct 2024 16:45 UTC

2 points

0 comments5 min readLW link

The Personal Implications of AGI Realism

xizneb20 Oct 2024 16:43 UTC

7 points

8 comments5 min readLW link

Safety tax functions

owencb20 Oct 2024 14:08 UTC

31 points

0 comments6 min readLW link

(strangecities.substack.com)

Exploring the Platonic Representation Hypothesis Beyond In-Distribution Data

Ram Bharadwaj20 Oct 2024 8:40 UTC

14 points

2 comments1 min readLW link

Electoral Systems

RedFishBlueFish20 Oct 2024 3:25 UTC

1 point

0 comments14 min readLW link

Overcoming Bias Anthology

Arjun Panickssery20 Oct 2024 2:01 UTC

173 points

14 comments2 min readLW link

(overcoming-bias-anthology.com)

D/acc AI Security Salon

Allison Duettmann19 Oct 2024 22:17 UTC

19 points

0 comments1 min readLW link

Who Should Have Been Killed, and Contains Neato? Who Else Could It Be, but that Villain Magneto!

Ace Delgado19 Oct 2024 20:39 UTC

−16 points

0 comments1 min readLW link

If far-UV is so great, why isn’t it everywhere?

Austin Chen19 Oct 2024 18:56 UTC

72 points

24 comments9 min readLW link

(strainhardening.substack.com)

What if AGI was already accidentally created in 2019? [Fictional story]

Alice Wanderland19 Oct 2024 9:17 UTC

−3 points

2 comments15 min readLW link

(aliceandbobinwanderland.substack.com)

[Question] What actual bad outcome has “ethics-based” RLHF AI Alignment already prevented?

Roko19 Oct 2024 6:11 UTC

7 points

16 comments1 min readLW link

[Question] What’s a good book for a technically-minded 11-year old?

Martin Sustrik19 Oct 2024 6:05 UTC

10 points

32 comments1 min readLW link

Methodology: Contagious Beliefs

James Stephen Brown19 Oct 2024 3:58 UTC

3 points

0 comments7 min readLW link

AI Prejudices: Practical Implications

PeterMcCluskey19 Oct 2024 2:19 UTC

12 points

0 comments5 min readLW link

(bayesianinvestor.com)

Start an Upper-Room UV Installation Company?

jefftk19 Oct 2024 2:00 UTC

44 points

9 comments1 min readLW link

(www.jefftk.com)

How I’d like alignment to get done (as of 2024-10-18)

TristanTrim18 Oct 2024 23:39 UTC

11 points

4 comments4 min readLW link

Sabotage Evaluations for Frontier Models

David Duvenaud, Joe Benton, Sam Bowman, evhub, mishajw, Eric Christiansen, HoldenKarnofsky, Ethan Perez and Buck

18 Oct 2024 22:33 UTC

95 points

56 comments6 min readLW link

(assets.anthropic.com)

D&D Sci Coliseum: Arena of Data

aphyer18 Oct 2024 22:02 UTC

42 points

23 comments4 min readLW link

the Daydication technique

chaosmage18 Oct 2024 21:47 UTC

42 points

0 comments2 min readLW link

[Linkpost] Hawkish nationalism vs international AI power and benefit sharing

jakub_krys and Naci Cankaya

18 Oct 2024 18:13 UTC

7 points

5 comments1 min readLW link

(nacicankaya.substack.com)

LLM Psychometrics and Prompt-Induced Psychopathy

Korbinian K.18 Oct 2024 18:11 UTC

12 points

2 comments10 min readLW link

A short project on Mamba: grokking & interpretability

Alejandro Tlaie18 Oct 2024 16:59 UTC

21 points

0 comments6 min readLW link

LLMs can learn about themselves by introspection

Felix J Binder and Owain_Evans

18 Oct 2024 16:12 UTC

111 points

38 comments9 min readLW link

[Question] Are there more than 12 paths to Superintelligence?

p4rziv4l18 Oct 2024 16:05 UTC

−3 points

0 comments1 min readLW link

Low Probability Estimation in Language Models

Gabriel Wu18 Oct 2024 15:50 UTC

50 points

0 comments10 min readLW link

(www.alignment.org)

The Mysterious Trump Buyers on Polymarket

Annapurna18 Oct 2024 13:26 UTC

46 points

11 comments2 min readLW link 1 review

(jorgevelez.substack.com)

On Intentionality, or: Towards a More Inclusive Concept of Lying

Cornelius Dybdahl18 Oct 2024 10:37 UTC

8 points

0 comments4 min readLW link

NAO Updates, Fall 2024

jefftk18 Oct 2024 0:00 UTC

32 points

2 comments4 min readLW link

(naobservatory.org)

You’re Playing a Rough Game

jefftk17 Oct 2024 19:20 UTC

25 points

2 comments2 min readLW link

(www.jefftk.com)