All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 272829 30 31

A Short Memo on AI Interpretability Rainbows

scasper27 Jul 2023 23:05 UTC

18 points

0 comments2 min readLW link

Pulling the Rope Sideways: Empirical Test Results

Daniel Kokotajlo27 Jul 2023 22:18 UTC

63 points

18 comments1 min readLW link

A $10k retroactive grant for VaccinateCA

Austin Chen27 Jul 2023 18:14 UTC

82 points

0 comments6 min readLW link

(manifund.org)

Preference Aggregation as Bayesian Inference

beren27 Jul 2023 17:59 UTC

14 points

1 comment1 min readLW link

AI #22: Into the Weeds

Zvi27 Jul 2023 17:40 UTC

49 points

8 comments84 min readLW link

(thezvi.wordpress.com)

SSA rejects anthropic shadow, too

jessicata27 Jul 2023 17:25 UTC

83 points

39 comments11 min readLW link

(unstableontology.com)

[Question] What are examples of someone doing a lot of work to find the best of something?

chanamessinger27 Jul 2023 15:58 UTC

29 points

16 comments1 min readLW link

AI-Plans.com 10-day Critique-a-Thon

Iknownothing27 Jul 2023 11:44 UTC

8 points

2 comments2 min readLW link

(manifund.org)

Privacy in a Digital World

Faustify27 Jul 2023 10:46 UTC

2 points

0 comments5 min readLW link

Cultivating a state of mind where new ideas are born

Henrik Karlsson27 Jul 2023 9:16 UTC

262 points

21 comments14 min readLW link 2 reviews

(www.henrikkarlsson.xyz)

Partial Transcript of Recent Senate Hearing Discussing AI X-Risk

Daniel_Eth27 Jul 2023 9:16 UTC

55 points

0 comments22 min readLW link

(medium.com)

AXRP Episode 24 - Superalignment with Jan Leike

DanielFilan27 Jul 2023 4:00 UTC

55 points

3 comments69 min readLW link

AXRP Episode 23 - Mechanistic Anomaly Detection with Mark Xu

DanielFilan27 Jul 2023 1:50 UTC

22 points

0 comments72 min readLW link

GPT-4 can catch subtle cross-language translation mistakes

Michael Tontchev27 Jul 2023 1:39 UTC

7 points

1 comment1 min readLW link

Social Balance through Embracing Social Credit

dhruvv26 Jul 2023 20:07 UTC

−39 points

9 comments3 min readLW link

Why no Roman Industrial Revolution?

jasoncrawford26 Jul 2023 19:34 UTC

62 points

30 comments3 min readLW link

(rootsofprogress.org)

Why you can’t treat decidability and complexity as a constant (Post #1)

Noosphere8926 Jul 2023 17:54 UTC

6 points

13 comments5 min readLW link

A response to the Richards et al.’s “The Illusion of AI’s Existential Risk”

Harrison Fell26 Jul 2023 17:34 UTC

1 point

0 comments10 min readLW link

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck and ryan_greenblatt

26 Jul 2023 17:02 UTC

101 points

19 comments1 min readLW link 1 review

Neuronpedia

Johnny Lin26 Jul 2023 16:29 UTC

135 points

51 comments2 min readLW link

(neuronpedia.org)

Frontier Model Forum

Zach Stein-Perlman26 Jul 2023 14:30 UTC

27 points

0 comments4 min readLW link

(blog.google)

Podcasts: Future of Life Institute, Breakthrough Science Summit panel

jasoncrawford26 Jul 2023 14:28 UTC

8 points

0 comments1 min readLW link

(rootsofprogress.org)

Llama We Doing This Again?

Zvi26 Jul 2023 13:00 UTC

48 points

3 comments16 min readLW link

(thezvi.wordpress.com)

Frontier Model Security

Vaniver26 Jul 2023 4:48 UTC

32 points

1 comment3 min readLW link

(www.anthropic.com)

The First Room-Temperature Ambient-Pressure Superconductor

Annapurna26 Jul 2023 2:27 UTC

35 points

28 comments1 min readLW link

(arxiv.org)

Underwater Torture Chambers: The Horror Of Fish Farming

Bentham's Bulldog26 Jul 2023 0:27 UTC

78 points

51 comments10 min readLW link 1 review

Contra Alexander on the Bitter Lesson and IQ

Andrew Keenan Richardson26 Jul 2023 0:07 UTC

9 points

1 comment4 min readLW link

(mechanisticmind.com)

Overcoming the MWC

Mark Freed25 Jul 2023 17:31 UTC

3 points

0 comments3 min readLW link

Russian parliamentarian: let’s ban personal computers and the Internet

RomanS25 Jul 2023 17:30 UTC

11 points

6 comments2 min readLW link

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Corin Katzke and Dan H

25 Jul 2023 16:58 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

“The Universe of Minds”—call for reviewers (Seeds of Science)

rogersbacon25 Jul 2023 16:53 UTC

7 points

0 comments1 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

beren25 Jul 2023 16:41 UTC

54 points

4 comments18 min readLW link

Should you work at a leading AI lab? (including in non-safety roles)

Benjamin Hilton25 Jul 2023 16:29 UTC

7 points

0 comments12 min readLW link

Whisper’s Word-Level Timestamps are Out

Varshul Gupta25 Jul 2023 14:32 UTC

−18 points

2 comments2 min readLW link

(dubverseblack.substack.com)

AIS 101: Task decomposition for scalable oversight

Charbel-Raphaël25 Jul 2023 13:34 UTC

35 points

0 comments19 min readLW link

(docs.google.com)

Anthropic Observations

Zvi25 Jul 2023 12:50 UTC

104 points

1 comment10 min readLW link

(thezvi.wordpress.com)

Autonomous Alignment Oversight Framework (AAOF)

Justausername25 Jul 2023 10:25 UTC

−9 points

0 comments4 min readLW link

How LLMs are and are not myopic

janus25 Jul 2023 2:19 UTC

139 points

16 comments8 min readLW link

Secure Hand Holding

jefftk25 Jul 2023 1:40 UTC

28 points

43 comments1 min readLW link

(www.jefftk.com)

Open problems in activation engineering

TurnTrout, woog, lisathiergart, Monte M and Ulisse Mini

24 Jul 2023 19:46 UTC

51 points

2 comments1 min readLW link

(coda.io)

Subdivisions for Useful Distillations?

Sharat Jacob Jacob24 Jul 2023 18:55 UTC

9 points

2 comments2 min readLW link

Optimizing For Approval And Disapproval

Thoth Hermes24 Jul 2023 18:46 UTC

−1 points

0 comments12 min readLW link

(thothhermes.substack.com)

An Opinionated Guide to Computability and Complexity (Post #0)

Noosphere8924 Jul 2023 17:53 UTC

10 points

10 comments3 min readLW link

Slowing down AI progress is an underexplored alignment strategy

Norman Borlaug24 Jul 2023 16:56 UTC

43 points

27 comments5 min readLW link

Anticipation in LLMs

derek shiller24 Jul 2023 15:53 UTC

6 points

0 comments13 min readLW link

The cone of freedom (or, freedom might only be instrumentally valuable)

dkl924 Jul 2023 15:38 UTC

−10 points

6 comments2 min readLW link

(dkl9.net)

A reformulation of Finite Factored Sets

Matthias G. Mayer24 Jul 2023 13:02 UTC

79 points

1 comment8 min readLW link

Brain Efficiency Cannell Prize Contest Award Ceremony

Alexander Gietelink Oldenziel24 Jul 2023 11:30 UTC

150 points

12 comments7 min readLW link

[Crosspost] An AI Pause Is Humanity’s Best Bet For Preventing Extinction (TIME)

otto.barten24 Jul 2023 10:07 UTC

12 points

0 comments7 min readLW link

(time.com)

Cryonics and Regret

MvB24 Jul 2023 9:16 UTC

193 points

37 comments2 min readLW link 1 review