All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30 31

[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.

Linch20 May 2024 23:50 UTC

31 points

8 comments1 min readLW link

(variety.com)

Some perspectives on the discipline of Physics

Tahp20 May 2024 18:19 UTC

18 points

3 comments13 min readLW link

(quark.rodeo)

[Question] Are there any groupchats for people working on Representation reading/control, activation steering type experiments?

Joe Kwon20 May 2024 18:03 UTC

3 points

1 comment1 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

20 May 2024 17:55 UTC

24 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

20 May 2024 17:53 UTC

108 points

4 comments3 min readLW link

NAO Updates, Spring 2024

jefftk20 May 2024 16:51 UTC

13 points

0 comments6 min readLW link

(naobservatory.org)

OpenAI: Exodus

Zvi20 May 2024 13:10 UTC

153 points

26 comments44 min readLW link

(thezvi.wordpress.com)

Infra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC

31 points

1 comment20 min readLW link 1 review

Jaan Tallinn’s 2023 Philanthropy Overview

jaan20 May 2024 12:11 UTC

203 points

5 comments1 min readLW link

(jaan.info)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

abstractapplic20 May 2024 9:38 UTC

31 points

2 comments1 min readLW link

Why I find Davidad’s plan interesting

Paul W20 May 2024 8:13 UTC

18 points

0 comments6 min readLW link

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC

25 points

21 comments10 min readLW link

(www.anthropic.com)

The consistent guessing problem is easier than the halting problem

jessicata20 May 2024 4:02 UTC

38 points

5 comments4 min readLW link

(unstableontology.com)

A poem titled ‘Tick Tock’.

Krantz20 May 2024 3:52 UTC

−1 points

0 comments1 min readLW link

Testing for parallel reasoning in LLMs

meemi and Olli Järviniemi

19 May 2024 15:28 UTC

9 points

7 comments9 min readLW link

Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

O O19 May 2024 2:18 UTC

14 points

15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh Englander19 May 2024 0:21 UTC

20 points

2 comments3 min readLW link

On Privilege

Shmi18 May 2024 22:36 UTC

16 points

10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Johannes C. Mayer18 May 2024 19:53 UTC

22 points

37 comments6 min readLW link

To Limit Impact, Limit KL-Divergence

J Bostock18 May 2024 18:52 UTC

10 points

1 comment5 min readLW link

[Question] Are There Other Ideas as Generally Applicable as Natural Selection

Amin Sennour18 May 2024 16:37 UTC

2 points

1 comment1 min readLW link

Scientific Notation Options

jefftk18 May 2024 15:10 UTC

27 points

13 comments1 min readLW link

(www.jefftk.com)

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plex18 May 2024 14:09 UTC

68 points

23 comments2 min readLW link

(aisafety.info)

What Are Non-Zero-Sum Games?—A Primer

James Stephen Brown18 May 2024 9:19 UTC

4 points

7 comments3 min readLW link

DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-Perlman18 May 2024 3:00 UTC

160 points

14 comments4 min readLW link

International Scientific Report on the Safety of Advanced AI: Key Information

Aryeh Englander18 May 2024 1:45 UTC

39 points

0 comments13 min readLW link

Goodhart in RL with KL: Appendix

Thomas Kwa18 May 2024 0:40 UTC

12 points

0 comments6 min readLW link

AI 2030 – AI Policy Roadmap

LTM17 May 2024 23:29 UTC

8 points

0 comments1 min readLW link

MIT FutureTech are hiring for an Operations and Project Management role.

peterslattery17 May 2024 23:21 UTC

2 points

0 comments3 min readLW link

Language Models Model Us

eggsyntax17 May 2024 21:00 UTC

159 points

56 comments7 min readLW link 1 review

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar Skalse17 May 2024 19:13 UTC

67 points

10 comments2 min readLW link

Agency

A*17 May 2024 19:11 UTC

8 points

0 comments1 min readLW link

DeepMind: Frontier Safety Framework

Zach Stein-Perlman17 May 2024 17:30 UTC

64 points

0 comments3 min readLW link

(deepmind.google)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill and Lee Sharkey

17 May 2024 16:25 UTC

57 points

20 comments4 min readLW link

(arxiv.org)

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and honeybee

17 May 2024 15:57 UTC

84 points

3 comments1 min readLW link

Is There Really a Child Penalty in the Long Run?

Maxwell Tabarrok17 May 2024 11:56 UTC

26 points

6 comments5 min readLW link

(www.maximum-progress.com)

My Hammer Time Final Exam

adios17 May 2024 9:28 UTC

10 points

3 comments3 min readLW link

[Question] Is there a place to find the most cited LW articles of all time?

keltan17 May 2024 1:20 UTC

4 points

3 comments1 min readLW link

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

abstractapplic17 May 2024 0:25 UTC

34 points

12 comments2 min readLW link

To an LLM, everything looks like a logic puzzle

Jesse Richardson16 May 2024 22:21 UTC

14 points

2 comments2 min readLW link

AI Safety Institute’s Inspect hello world example for AI evals

TheManxLoiner16 May 2024 20:47 UTC

3 points

0 comments1 min readLW link

(lovkush.medium.com)

Feeling (instrumentally) Rational

Morphism16 May 2024 18:56 UTC

14 points

5 comments1 min readLW link

Advice for Activists from the History of Environmentalism

Jeffrey Heninger16 May 2024 18:40 UTC

101 points

10 comments6 min readLW link

(blog.aiimpacts.org)

Ninety-five theses on AI

hamandcheese16 May 2024 17:51 UTC

21 points

0 comments7 min readLW link

GPT-4o My and Google I/O Day

Zvi16 May 2024 17:50 UTC

41 points

2 comments37 min readLW link

(thezvi.wordpress.com)

AI #64: Feel the Mundane Utility

Zvi16 May 2024 15:20 UTC

28 points

11 comments47 min readLW link

(thezvi.wordpress.com)

AISN #35: Lobbying on AI Regulation Plus, New Models from OpenAI and Google, and Legal Regimes for Training on Copyrighted Data

Dan H and Corin Katzke

16 May 2024 14:29 UTC

2 points

3 comments6 min readLW link

(newsletter.safe.ai)

FMT: a great opportunity for (soon-to-be) parents

EternallyBlissful16 May 2024 13:24 UTC

13 points

1 comment18 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Gunnar_Zarncke16 May 2024 13:09 UTC

51 points

20 comments1 min readLW link

(arxiv.org)

The Dunning-Kruger of disproving Dunning-Kruger

kromem16 May 2024 10:11 UTC

58 points

0 comments5 min readLW link