All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar AprMayJun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30 31

Anthropic announces interpretability advances. How much does this advance alignment?

Seth HerdMay 21, 2024, 10:30 PM

49 points

24 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

(www.anthropic.com)

[Question] What would stop you from paying for an LLM?

yanni kyriacosMay 21, 2024, 10:25 PM

17 points

6 votes

Overall karma indicates overall quality.

15 comments1 min readLW link

EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024

scasperMay 21, 2024, 8:15 PM

157 points

96 votes

Overall karma indicates overall quality.

16 comments3 min readLW link

Mitigating extreme AI risks amid rapid progress [Linkpost]

Orpheus16May 21, 2024, 7:59 PM

21 points

8 votes

Overall karma indicates overall quality.

7 comments4 min readLW link

The problem with rationality

David LoomisMay 21, 2024, 6:49 PM

−17 points

7 votes

Overall karma indicates overall quality.

1 comment6 min readLW link

rough draft on what happens in the brain when you have an insight

EmrikMay 21, 2024, 6:02 PM

11 points

5 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

On Dwarkesh’s Podcast with OpenAI’s John Schulman

ZviMay 21, 2024, 5:30 PM

73 points

33 votes

Overall karma indicates overall quality.

4 comments20 min readLW link

(thezvi.wordpress.com)

[Question] Is deleting capabilities still a relevant research question?

tailcalledMay 21, 2024, 1:24 PM

15 points

6 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

New voluntary commitments (AI Seoul Summit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM

81 points

26 votes

Overall karma indicates overall quality.

17 comments7 min readLW link

(www.gov.uk)

ACX/LW/EA/* Meetup Bremen

RasmusHBMay 21, 2024, 5:42 AM

2 points

1 vote

Overall karma indicates overall quality.

0 comments1 min readLW link

My Dating Heuristic

Declan MolonyMay 21, 2024, 5:28 AM

27 points

12 votes

Overall karma indicates overall quality.

4 comments2 min readLW link

Scorable Functions: A Format for Algorithmic Forecasting

ozziegooenMay 21, 2024, 4:14 AM

29 points

13 votes

Overall karma indicates overall quality.

0 comments8 min readLW link

The Problem With the Word ‘Alignment’

peligrietzer and particlemania

May 21, 2024, 3:48 AM

63 points

24 votes

Overall karma indicates overall quality.

8 comments6 min readLW link

What’s Going on With OpenAI’s Messaging?

ozziegooenMay 21, 2024, 2:22 AM

191 points

81 votes

Overall karma indicates overall quality.

13 comments3 min readLW link

Harmony Intelligence is Hiring!

James Dao and Soroush Pour

May 21, 2024, 2:11 AM

10 points

10 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

(www.harmonyintelligence.com)

[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.

LinchMay 20, 2024, 11:50 PM

31 points

18 votes

Overall karma indicates overall quality.

8 comments1 min readLW link

(variety.com)

Some perspectives on the discipline of Physics

TahpMay 20, 2024, 6:19 PM

18 points

9 votes

Overall karma indicates overall quality.

3 comments13 min readLW link

(quark.rodeo)

[Question] Are there any groupchats for people working on Representation reading/control, activation steering type experiments?

Joe KwonMay 20, 2024, 6:03 PM

3 points

4 votes

Overall karma indicates overall quality.

1 comment1 min readLW link

Interpretability: Integrated Gradients is a decent attribution method

Lucius Bushnaq, jake_mendel, StefanHex and Kaarel

May 20, 2024, 5:55 PM

23 points

15 votes

Overall karma indicates overall quality.

7 comments6 min readLW link

The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks

Lucius Bushnaq, jake_mendel, Dan Braun, StefanHex, Nicholas Goldowsky-Dill, Kaarel, Avery, Joern Stoehler, debrevitatevitae, Magdalena Wache and Marius Hobbhahn

May 20, 2024, 5:53 PM

108 points

38 votes

Overall karma indicates overall quality.

4 comments3 min readLW link

NAO Updates, Spring 2024

jefftkMay 20, 2024, 4:51 PM

13 points

3 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

(naobservatory.org)

OpenAI: Exodus

ZviMay 20, 2024, 1:10 PM

153 points

62 votes

Overall karma indicates overall quality.

26 comments44 min readLW link

(thezvi.wordpress.com)

Infra-Bayesian haggling

hannagaborMay 20, 2024, 12:23 PM

28 points

9 votes

Overall karma indicates overall quality.

0 comments20 min readLW link

Jaan Tallinn’s 2023 Philanthropy Overview

jaanMay 20, 2024, 12:11 PM

203 points

92 votes

Overall karma indicates overall quality.

5 comments1 min readLW link

(jaan.info)

D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]

abstractapplicMay 20, 2024, 9:38 AM

31 points

15 votes

Overall karma indicates overall quality.

2 comments1 min readLW link

Why I find Davidad’s plan interesting

Paul WMay 20, 2024, 8:13 AM

18 points

7 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

Anthropic: Reflections on our Responsible Scaling Policy

Zac Hatfield-DoddsMay 20, 2024, 4:14 AM

30 points

23 votes

Overall karma indicates overall quality.

21 comments10 min readLW link

(www.anthropic.com)

The consistent guessing problem is easier than the halting problem

jessicataMay 20, 2024, 4:02 AM

38 points

9 votes

Overall karma indicates overall quality.

5 comments4 min readLW link

(unstableontology.com)

A poem titled ‘Tick Tock’.

KrantzMay 20, 2024, 3:52 AM

−1 points

2 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

Against Computers (infinite play)

rogersbaconMay 20, 2024, 12:43 AM

−11 points

9 votes

Overall karma indicates overall quality.

1 comment14 min readLW link

(www.secretorum.life)

Testing for parallel reasoning in LLMs

meemi and Olli Järviniemi

May 19, 2024, 3:28 PM

9 points

7 votes

Overall karma indicates overall quality.

7 comments9 min readLW link

Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)

O OMay 19, 2024, 2:18 AM

14 points

20 votes

Overall karma indicates overall quality.

15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh EnglanderMay 19, 2024, 12:21 AM

20 points

6 votes

Overall karma indicates overall quality.

2 comments3 min readLW link

On Privilege

ShmiMay 18, 2024, 10:36 PM

16 points

26 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University

Johannes C. MayerMay 18, 2024, 7:53 PM

22 points

41 votes

Overall karma indicates overall quality.

37 comments6 min readLW link

To Limit Impact, Limit KL-Divergence

J BostockMay 18, 2024, 6:52 PM

10 points

6 votes

Overall karma indicates overall quality.

1 comment5 min readLW link

[Question] Are There Other Ideas as Generally Applicable as Natural Selection

Amin SennourMay 18, 2024, 4:37 PM

1 point

1 vote

Overall karma indicates overall quality.

1 comment1 min readLW link

Scientific Notation Options

jefftkMay 18, 2024, 3:10 PM

27 points

14 votes

Overall karma indicates overall quality.

13 comments1 min readLW link

(www.jefftk.com)

“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”

plexMay 18, 2024, 2:09 PM

54 points

26 votes

Overall karma indicates overall quality.

23 comments2 min readLW link

(aisafety.info)

What Are Non-Zero-Sum Games?—A Primer

James Stephen BrownMay 18, 2024, 9:19 AM

4 points

4 votes

Overall karma indicates overall quality.

7 comments3 min readLW link

DeepMind’s “Frontier Safety Framework” is weak and unambitious

Zach Stein-PerlmanMay 18, 2024, 3:00 AM

159 points

56 votes

Overall karma indicates overall quality.

14 comments4 min readLW link

International Scientific Report on the Safety of Advanced AI: Key Information

Aryeh EnglanderMay 18, 2024, 1:45 AM

39 points

11 votes

Overall karma indicates overall quality.

0 comments13 min readLW link

Goodhart in RL with KL: Appendix

Thomas KwaMay 18, 2024, 12:40 AM

12 points

5 votes

Overall karma indicates overall quality.

0 comments6 min readLW link

AI 2030 – AI Policy Roadmap

LTMMay 17, 2024, 11:29 PM

8 points

3 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

MIT FutureTech are hiring for an Operations and Project Management role.

peterslatteryMay 17, 2024, 11:21 PM

2 points

1 vote

Overall karma indicates overall quality.

0 comments3 min readLW link

Language Models Model Us

eggsyntaxMay 17, 2024, 9:00 PM

159 points

70 votes

Overall karma indicates overall quality.

55 comments7 min readLW link

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Joar SkalseMay 17, 2024, 7:13 PM

67 points

30 votes

Overall karma indicates overall quality.

10 comments2 min readLW link

Agency

A*May 17, 2024, 7:11 PM

8 points

5 votes

Overall karma indicates overall quality.

0 comments1 min readLW link

DeepMind: Frontier Safety Framework

Zach Stein-PerlmanMay 17, 2024, 5:30 PM

64 points

20 votes

Overall karma indicates overall quality.

0 comments3 min readLW link

(deepmind.google)

Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning

Dan Braun, Jordan Taylor, Nicholas Goldowsky-Dill and Lee Sharkey

May 17, 2024, 4:25 PM

57 points

23 votes

Overall karma indicates overall quality.

20 comments4 min readLW link

(arxiv.org)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer