All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30 31

The Problem With the Word ‘Alignment’

peligrietzer and particlemania

May 21, 2024, 3:48 AM

63 points

8 comments6 min readLW link

What’s Going on With OpenAI’s Messaging?

ozziegooenMay 21, 2024, 2:22 AM

191 points

13 comments3 min readLW link

Harmony Intelligence is Hiring!

James Dao and Soroush Pour

May 21, 2024, 2:11 AM

10 points

0 comments1 min readLW link

(www.harmonyintelligence.com)

[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.

LinchMay 20, 2024, 11:50 PM

31 points

8 comments1 min readLW link

(variety.com)

Some perspectives on the discipline of Physics

TahpMay 20, 2024, 6:19 PM

17 points

3 comments13 min readLW link

(quark.rodeo)

[Question] Are there any groupchats for people working on Representation reading/control, activation steering type experiments?

Joe KwonMay 20, 2024, 6:03 PM

3 points

1 comment1 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

May 20, 2024, 5:55 PM

23 points

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

May 20, 2024, 5:53 PM

108 points

4 comments3 min readLW link

NAO Updates, Spring 2024

jefftkMay 20, 2024, 4:51 PM

13 points

0 comments6 min readLW link

(naobservatory.org)

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM

153 points

26 comments44 min readLW link

(thezvi.wordpress.com)

Infra-Bayesian haggling

hannagaborMay 20, 2024, 12:23 PM

28 points

0 comments20 min readLW link

Jaan Tallinn’s 2023 Philanthropy Overview

jaanMay 20, 2024, 12:11 PM

203 points

5 comments1 min readLW link

(jaan.info)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

abstractapplicMay 20, 2024, 9:38 AM

31 points

2 comments1 min readLW link

Why I find Davidad’s plan interesting

Paul WMay 20, 2024, 8:13 AM

18 points

0 comments6 min readLW link

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM

30 points

21 comments10 min readLW link

(www.anthropic.com)

The consistent guessing problem is easier than the halting problem

jessicataMay 20, 2024, 4:02 AM

38 points

5 comments4 min readLW link

(unstableontology.com)

A poem titled ‘Tick Tock’.

KrantzMay 20, 2024, 3:52 AM

−1 points

0 comments1 min readLW link

Against Computers (infinite play)

rogersbaconMay 20, 2024, 12:43 AM

−11 points

1 comment14 min readLW link

(www.secretorum.life)

Testing for parallel reasoning in LLMs

meemi and Olli Järviniemi

May 19, 2024, 3:28 PM

9 points

7 comments9 min readLW link

Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

O OMay 19, 2024, 2:18 AM

14 points

15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh EnglanderMay 19, 2024, 12:21 AM

20 points

2 comments3 min readLW link

On Privilege

ShmiMay 18, 2024, 10:36 PM

15 points

10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Johannes C. MayerMay 18, 2024, 7:53 PM

22 points

37 comments6 min readLW link

To Limit Impact, Limit KL-Divergence

J BostockMay 18, 2024, 6:52 PM

10 points

1 comment5 min readLW link

[Question] Are There Other Ideas as Generally Applicable as Natural Selection

Amin SennourMay 18, 2024, 4:37 PM

1 point

1 comment1 min readLW link

Scientific Notation Options

jefftkMay 18, 2024, 3:10 PM

27 points

13 comments1 min readLW link

(www.jefftk.com)

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plexMay 18, 2024, 2:09 PM

54 points

23 comments2 min readLW link

(aisafety.info)

What Are Non-Zero-Sum Games?—A Primer

James Stephen BrownMay 18, 2024, 9:19 AM

4 points

7 comments3 min readLW link

DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-PerlmanMay 18, 2024, 3:00 AM

159 points

14 comments4 min readLW link

International Scientific Report on the Safety of Advanced AI: Key Information

Aryeh EnglanderMay 18, 2024, 1:45 AM

39 points

0 comments13 min readLW link

Goodhart in RL with KL: Appendix

Thomas KwaMay 18, 2024, 12:40 AM

12 points

0 comments6 min readLW link

AI 2030 – AI Policy Roadmap

LTMMay 17, 2024, 11:29 PM

8 points

0 comments1 min readLW link

MIT FutureTech are hiring for an Operations and Project Management role.

peterslatteryMay 17, 2024, 11:21 PM

2 points

0 comments3 min readLW link

Language Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM

159 points

55 comments7 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar SkalseMay 17, 2024, 7:13 PM

67 points

10 comments2 min readLW link

Agency

A*May 17, 2024, 7:11 PM

8 points

0 comments1 min readLW link

DeepMind: Frontier Safety Framework

Zach Stein-PerlmanMay 17, 2024, 5:30 PM

64 points

0 comments3 min readLW link

(deepmind.google)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill and Lee Sharkey

May 17, 2024, 4:25 PM

57 points

20 comments4 min readLW link

(arxiv.org)

AISafety.com – Resources for AI Safety

Søren Elverlin, plex, Bryce Robertson and Melissa Samworth

May 17, 2024, 3:57 PM

83 points

3 comments1 min readLW link

Is There Really a Child Penalty in the Long Run?

Maxwell TabarrokMay 17, 2024, 11:56 AM

23 points

6 comments5 min readLW link

(www.maximum-progress.com)

My Hammer Time Final Exam

adiosMay 17, 2024, 9:28 AM

10 points

3 comments3 min readLW link

[Question] Is there a place to find the most cited LW articles of all time?

keltanMay 17, 2024, 1:20 AM

4 points

3 comments1 min readLW link

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures

abstractapplicMay 17, 2024, 12:25 AM

34 points

12 comments2 min readLW link

To an LLM, everything looks like a logic puzzle

Jesse RichardsonMay 16, 2024, 10:21 PM

14 points

2 comments2 min readLW link

AI Safety Institute’s Inspect hello world example for AI evals

TheManxLoinerMay 16, 2024, 8:47 PM

3 points

0 comments1 min readLW link

(lovkush.medium.com)

Feeling (instrumentally) Rational

MorphismMay 16, 2024, 6:56 PM

14 points

5 comments1 min readLW link

Advice for Activists from the History of Environmentalism

Jeffrey HeningerMay 16, 2024, 6:40 PM

100 points

8 comments6 min readLW link

(blog.aiimpacts.org)

Ninety-five theses on AI

hamandcheeseMay 16, 2024, 5:51 PM

21 points

0 comments7 min readLW link

GPT-4o My and Google I/O Day

ZviMay 16, 2024, 5:50 PM

41 points

2 comments37 min readLW link

(thezvi.wordpress.com)

AI #64: Feel the Mundane Utility

ZviMay 16, 2024, 3:20 PM

28 points

11 comments47 min readLW link

(thezvi.wordpress.com)