All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 678 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Attention-Feature Tables in Gemma 2 Residual Streams

J Bostock6 Aug 2024 22:56 UTC

2 points

0 comments14 min readLW link

[Question] What are the strategic implications if aliens and Earth civilizations produce similar utilities?

Maxime Riché6 Aug 2024 21:16 UTC

4 points

1 comment1 min readLW link

WTH is Cerebrolysin, actually?

gsfitzgerald and delton137

6 Aug 2024 20:40 UTC

184 points

23 comments17 min readLW link

FHE Can’t Save Us: The Case Against Cryptographic AI Boxing

Bart Jaworski6 Aug 2024 17:46 UTC

6 points

0 comments6 min readLW link

Inference-Only Debate Experiments Using Math Problems

Arjun Panickssery, Abhimanyu Pallavi Sudhir and JacksonKaunismaa

6 Aug 2024 17:44 UTC

31 points

0 comments2 min readLW link

[Question] Is an AI religion justified?

p4rziv4l6 Aug 2024 15:42 UTC

−35 points

11 comments1 min readLW link

Startup Roundup #2

Zvi6 Aug 2024 13:30 UTC

45 points

0 comments32 min readLW link

(thezvi.wordpress.com)

Mechanistic Anomaly Detection Research Update

Nora Belrose and David Johnston

6 Aug 2024 10:33 UTC

11 points

0 comments1 min readLW link

(blog.eleuther.ai)

Reasoning is not search—a chess example

p.b.6 Aug 2024 9:29 UTC

4 points

3 comments2 min readLW link

Broadly human level, cognitively complete AGI

p.b.6 Aug 2024 9:26 UTC

9 points

0 comments1 min readLW link

Does Evolutionary Theory Imply Genetic Tribalism?

Zero Contradictions6 Aug 2024 5:43 UTC

0 points

1 comment1 min readLW link

(thewaywardaxolotl.blogspot.com)

How I Learned To Stop Trusting Prediction Markets and Love the Arbitrage

orthonormal6 Aug 2024 2:32 UTC

200 points

33 comments3 min readLW link 3 reviews

John Schulman leaves OpenAI for Anthropic [and then left Anthropic again for Thinking Machines]

Sodium6 Aug 2024 1:23 UTC

57 points

0 comments1 min readLW link

Self-explaining SAE features

Dmitrii Kharlapenko, neverix, Neel Nanda and Arthur Conmy

5 Aug 2024 22:20 UTC

62 points

13 comments10 min readLW link

Value fragility and AI takeover

Joe Carlsmith5 Aug 2024 21:28 UTC

76 points

5 comments30 min readLW link

Excursions into Sparse Autoencoders: What is monosemanticity?

Jakub Smékal5 Aug 2024 19:22 UTC

2 points

0 comments10 min readLW link

Madrid—ACX Meetups Everywhere Fall 2024

Pablo Villalobos5 Aug 2024 18:36 UTC

4 points

0 comments1 min readLW link

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom

Ghdz5 Aug 2024 18:27 UTC

6 points

2 comments7 min readLW link

Circular Reasoning

abramdemski5 Aug 2024 18:10 UTC

113 points

44 comments8 min readLW link 2 reviews

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours

Seth Herd5 Aug 2024 15:38 UTC

70 points

22 comments5 min readLW link

AI Safety at the Frontier: Paper Highlights, July ’24

gasteigerjo5 Aug 2024 13:00 UTC

8 points

0 comments7 min readLW link

(aisafetyfrontier.substack.com)

Game Theory and Society

Zero Contradictions5 Aug 2024 4:27 UTC

4 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Near-mode thinking on AI

Olli Järviniemi4 Aug 2024 20:47 UTC

126 points

10 comments5 min readLW link 1 review

Watermarks: Signing, Branding, and Boobytrapping

Shankar Sivarajan4 Aug 2024 20:41 UTC

4 points

0 comments1 min readLW link

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality

Wynn Walker4 Aug 2024 18:49 UTC

6 points

0 comments5 min readLW link

We’re not as 3-Dimensional as We Think

silentbob4 Aug 2024 14:39 UTC

48 points

17 comments5 min readLW link

You don’t know how bad most things are nor precisely how they’re bad.

Solenoid_Entity4 Aug 2024 14:12 UTC

356 points

52 comments5 min readLW link 1 review

[Question] What should we do about COVID in 2024?

ChristianKl4 Aug 2024 10:57 UTC

20 points

2 comments1 min readLW link

Tokenized SAEs: Infusing per-token biases.

tdooms and danwil

4 Aug 2024 9:17 UTC

20 points

20 comments15 min readLW link

Thoughts On Democracy

Zero Contradictions4 Aug 2024 6:02 UTC

2 points

0 comments1 min readLW link

(zerocontradictions.net)

AI Alignment through Comparative Advantage

artemiocobb4 Aug 2024 0:32 UTC

−2 points

4 comments3 min readLW link

Labelling, Variables, and In-Context Learning in Llama2

Joshua Penman3 Aug 2024 19:36 UTC

6 points

0 comments1 min readLW link

(colab.research.google.com)

[Question] Dan Hendrycks and EA

jeffreycaruso3 Aug 2024 13:33 UTC

−3 points

4 comments1 min readLW link

[Question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?

Dalcy3 Aug 2024 12:39 UTC

27 points

1 comment1 min readLW link

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

Jessica Rumbelow3 Aug 2024 12:07 UTC

43 points

2 comments4 min readLW link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut, Lewis Hammond and HarrietW

3 Aug 2024 10:16 UTC

9 points

0 comments14 min readLW link

(www.oliversourbut.net)

SRE’s review of Democracy

Martin Sustrik3 Aug 2024 7:20 UTC

48 points

2 comments3 min readLW link

(250bpm.substack.com)

The Case Against Libertarianism

Zero Contradictions3 Aug 2024 5:05 UTC

−4 points

1 comment1 min readLW link

(zerocontradictions.net)

We Don’t Just Let People Die—So What Next?

James Stephen Brown3 Aug 2024 1:04 UTC

11 points

8 comments10 min readLW link

The EA case for Trump

Kvee3 Aug 2024 1:00 UTC

14 points

1 comment1 min readLW link

(www.secondbest.ca)

I didn’t think I’d take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!

mako yass2 Aug 2024 22:35 UTC

25 points

2 comments5 min readLW link

Evaluating Sparse Autoencoders with Board Game Models

Adam Karvonen, Sam Marks, Can, Benjamin Wright, Jannik Brinkmann, Logan Riggs and Rico Angell

2 Aug 2024 19:50 UTC

38 points

1 comment9 min readLW link

The Bitter Lesson for AI Safety Research

adamk, Richard Ren, Dan H and GMM

2 Aug 2024 18:39 UTC

58 points

5 comments3 min readLW link

Ethical Deception: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC

5 points

2 comments7 min readLW link

[Question] Request for AI risk quotes, especially around speed, large impacts and black boxes

Nathan Young2 Aug 2024 17:49 UTC

6 points

0 comments1 min readLW link

A Simple Toy Coherence Theorem

johnswentworth and David Lorell

2 Aug 2024 17:47 UTC

81 points

23 comments7 min readLW link 1 review

All the Following are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC

16 points

3 comments10 min readLW link

The ‘strong’ feature hypothesis could be wrong

lewis smith2 Aug 2024 14:33 UTC

235 points

29 comments17 min readLW link 1 review

An information-theoretic study of lying in LLMs

Annah and Guillaume Corlouer

2 Aug 2024 10:06 UTC

17 points

0 comments4 min readLW link

How I Wrought a Lesser Scribing Artifact (You Can, Too!)

Lorxus2 Aug 2024 3:35 UTC

16 points

0 comments5 min readLW link