All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30 31

How I’d like alignment to get done (as of 2024-10-18)

TristanTrim18 Oct 2024 23:39 UTC

11 points

4 comments4 min readLW link

Sabotage Evaluations for Frontier Models

David Duvenaud, Joe Benton, Sam Bowman, evhub, mishajw, Eric Christiansen, HoldenKarnofsky, Ethan Perez and Buck

18 Oct 2024 22:33 UTC

95 points

56 comments6 min readLW link

(assets.anthropic.com)

D&D Sci Coliseum: Arena of Data

aphyer18 Oct 2024 22:02 UTC

42 points

23 comments4 min readLW link

the Daydication technique

chaosmage18 Oct 2024 21:47 UTC

32 points

0 comments2 min readLW link

[Linkpost] Hawkish nationalism vs international AI power and benefit sharing

jakub_krys and Naci Cankaya

18 Oct 2024 18:13 UTC

7 points

5 comments1 min readLW link

(nacicankaya.substack.com)

LLM Psychometrics and Prompt-Induced Psychopathy

Korbinian K.18 Oct 2024 18:11 UTC

12 points

2 comments10 min readLW link

A short project on Mamba: grokking & interpretability

Alejandro Tlaie18 Oct 2024 16:59 UTC

21 points

0 comments6 min readLW link

LLMs can learn about themselves by introspection

Felix J Binder and Owain_Evans

18 Oct 2024 16:12 UTC

111 points

38 comments9 min readLW link

[Question] Are there more than 12 paths to Superintelligence?

p4rziv4l18 Oct 2024 16:05 UTC

−3 points

0 comments1 min readLW link

Low Probability Estimation in Language Models

Gabriel Wu18 Oct 2024 15:50 UTC

50 points

0 comments10 min readLW link

(www.alignment.org)

The Mysterious Trump Buyers on Polymarket

Annapurna18 Oct 2024 13:26 UTC

46 points

11 comments2 min readLW link 1 review

(jorgevelez.substack.com)

On Intentionality, or: Towards a More Inclusive Concept of Lying

Cornelius Dybdahl18 Oct 2024 10:37 UTC

8 points

0 comments4 min readLW link

NAO Updates, Fall 2024

jefftk18 Oct 2024 0:00 UTC

32 points

2 comments4 min readLW link

(naobservatory.org)

You’re Playing a Rough Game

jefftk17 Oct 2024 19:20 UTC

25 points

2 comments2 min readLW link

(www.jefftk.com)

P=NP

OnePolynomial17 Oct 2024 17:56 UTC

−25 points

0 comments8 min readLW link

Factoring P(doom) into a bayesian network

Joseph Gardi17 Oct 2024 17:55 UTC

1 point

0 comments1 min readLW link

understanding bureaucracy

dhruvmethi17 Oct 2024 17:55 UTC

1 point

2 comments8 min readLW link

AI #86: Just Think of the Potential

Zvi17 Oct 2024 15:10 UTC

58 points

8 comments57 min readLW link

(thezvi.wordpress.com)

Concrete benefits of making predictions

Jonny Spicer and Sage Future

17 Oct 2024 14:23 UTC

36 points

5 comments6 min readLW link

(fatebook.io)

Arithmetic is an underrated world-modeling technology

dynomight17 Oct 2024 14:00 UTC

174 points

38 comments6 min readLW link 1 review

(dynomight.net)

The Computational Complexity of Circuit Discovery for Inner Interpretability

Bogdan Ionut Cirstea17 Oct 2024 13:18 UTC

11 points

2 comments1 min readLW link

(arxiv.org)

[Question] is there a big dictionary somewhere with all your jargon and acronyms and whatnot?

KvmanThinking17 Oct 2024 11:30 UTC

4 points

7 comments1 min readLW link

[Question] Is there a known method to find others who came across the same potential infohazard without spoiling it to the public?

hive17 Oct 2024 10:47 UTC

4 points

6 comments1 min readLW link

It is time to start war gaming for AGI

yanni kyriacos17 Oct 2024 5:14 UTC

4 points

1 comment1 min readLW link

[Question] Reinforcement Learning: Essential Step Towards AGI or Irrelevant?

Double17 Oct 2024 3:37 UTC

1 point

0 comments1 min readLW link

[Question] EndeavorOTC legit?

FinalFormal217 Oct 2024 1:33 UTC

3 points

0 comments1 min readLW link

The Cognitive Bootcamp Agreement

Raemon16 Oct 2024 23:24 UTC

36 points

1 comment8 min readLW link 1 review

Bitter lessons about lucid dreaming

avturchin16 Oct 2024 21:27 UTC

84 points

63 comments2 min readLW link

Towards Quantitative AI Risk Management

Henry Papadatos and simeon_c

16 Oct 2024 19:26 UTC

28 points

1 comment6 min readLW link

Why Academia is Mostly Not Truth-Seeking

Zero Contradictions16 Oct 2024 19:14 UTC

−7 points

6 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Launching Adjacent News

Lucas Kohorst16 Oct 2024 17:58 UTC

24 points

0 comments4 min readLW link

[Question] Interest in Leetcode, but for Rationality?

Gregory 16 Oct 2024 17:54 UTC

76 points

20 comments2 min readLW link

Request for advice: Research for Conversational Game Theory for LLMs

Rome Viharo16 Oct 2024 17:53 UTC

10 points

0 comments1 min readLW link

Why humans won’t control superhuman AIs.

Spiritus Dei16 Oct 2024 16:48 UTC

−11 points

1 comment6 min readLW link

Against empathy-by-default

Steven Byrnes16 Oct 2024 16:38 UTC

62 points

25 comments9 min readLW link

cancer rates after gene therapy

bhauth16 Oct 2024 15:32 UTC

54 points

2 comments3 min readLW link

(bhauth.com)

Monthly Roundup #23: October 2024

Zvi16 Oct 2024 13:50 UTC

39 points

13 comments50 min readLW link

(thezvi.wordpress.com)

[Question] Change My Mind: Thirders in “Sleeping Beauty” are Just Doing Epistemology Wrong

DragonGod16 Oct 2024 10:20 UTC

8 points

67 comments6 min readLW link

[Question] After uploading your consciousness...

Jinge Wang16 Oct 2024 3:52 UTC

−2 points

0 comments1 min readLW link

The ELYSIUM Proposal - Extrapolated voLitions Yielding Separate Individualized Utopias for Mankind

Roko16 Oct 2024 1:24 UTC

9 points

18 comments1 min readLW link

(transhumanaxiology.substack.com)

Bellevue Meetup

Cedar16 Oct 2024 1:07 UTC

3 points

0 comments1 min readLW link

Singular Learning Theory for Dummies

Rahul Chand15 Oct 2024 21:13 UTC

5 points

0 comments8 min readLW link

Distillation Of DeepSeek-Prover V1.5

IvanLin15 Oct 2024 18:53 UTC

4 points

1 comment3 min readLW link

Improving Model-Written Evals for AI Safety Benchmarking

Sunishchal Dev and Marius Hobbhahn

15 Oct 2024 18:25 UTC

30 points

0 comments18 min readLW link

Taking nonlogical concepts seriously

Kris Brown15 Oct 2024 18:16 UTC

7 points

5 comments18 min readLW link

(topos.site)

Rashomon—A newsbetting site

ideasthete15 Oct 2024 18:15 UTC

23 points

8 comments1 min readLW link

On the Practical Applications of Interpretability

Nick Jiang15 Oct 2024 17:18 UTC

5 points

1 comment7 min readLW link

Anthropic’s updated Responsible Scaling Policy

Zac Hatfield-Dodds15 Oct 2024 16:46 UTC

38 points

3 comments3 min readLW link

(www.anthropic.com)

[Question] When is reward ever the optimization target?

Noosphere8915 Oct 2024 15:09 UTC

37 points

17 comments1 min readLW link

An Opinionated Evals Reading List

Marius Hobbhahn and Jérémy Scheurer

15 Oct 2024 14:38 UTC

65 points

0 comments13 min readLW link

(www.apolloresearch.ai)