5 Aug 2024 22:20 UTC

62 points

13 comments10 min readLW link

Value fragility and AI takeover

Joe Carlsmith5 Aug 2024 21:28 UTC

76 points

5 comments30 min readLW link

Madrid—ACX Meetups Everywhere Fall 2024

Pablo Villalobos5 Aug 2024 18:36 UTC

4 points

0 comments1 min readLW link

LLMs stifle creativity, eliminate opportunities for serendipitous discovery and disrupt intergenerational transfer of wisdom

Ghdz5 Aug 2024 18:27 UTC

7 points

3 comments7 min readLW link

Circular Reasoning

abramdemski5 Aug 2024 18:10 UTC

113 points

44 comments8 min readLW link 2 reviews

Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours

Seth Herd5 Aug 2024 15:38 UTC

70 points

22 comments5 min readLW link

AI Safety at the Frontier: Paper Highlights, July ’24

gasteigerjo5 Aug 2024 13:00 UTC

8 points

0 comments7 min readLW link

(aisafetyfrontier.substack.com)

Game Theory and Society

Zero Contradictions5 Aug 2024 4:27 UTC

4 points

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Near-mode thinking on AI

Olli Järviniemi4 Aug 2024 20:47 UTC

126 points

10 comments5 min readLW link 1 review

Watermarks: Signing, Branding, and Boobytrapping

Shankar Sivarajan4 Aug 2024 20:41 UTC

4 points

0 comments1 min readLW link

Modelling Social Exchange: A Systematised Method to Judge Friendship Quality

Wynn Walker4 Aug 2024 18:49 UTC

6 points

0 comments5 min readLW link

We’re not as 3-Dimensional as We Think

silentbob4 Aug 2024 14:39 UTC

46 points

20 comments5 min readLW link

You don’t know how bad most things are nor precisely how they’re bad.

Solenoid_Entity4 Aug 2024 14:12 UTC

357 points

53 comments5 min readLW link 1 review

[Question] What should we do about COVID in 2024?

ChristianKl4 Aug 2024 10:57 UTC

20 points

2 comments1 min readLW link

Tokenized SAEs: Infusing per-token biases.

tdooms and danwil

4 Aug 2024 9:17 UTC

20 points

20 comments15 min readLW link

Thoughts On Democracy

Zero Contradictions4 Aug 2024 6:02 UTC

2 points

0 comments1 min readLW link

(zerocontradictions.net)

AI Alignment through Comparative Advantage

artemiocobb4 Aug 2024 0:32 UTC

−2 points

4 comments3 min readLW link

Labelling, Variables, and In-Context Learning in Llama2

Joshua Penman3 Aug 2024 19:36 UTC

6 points

0 comments1 min readLW link

(colab.research.google.com)

[Question] Dan Hendrycks and EA

jeffreycaruso3 Aug 2024 13:33 UTC

−3 points

4 comments1 min readLW link

[Question] Why do Minimal Bayes Nets often correspond to Causal Models of Reality?

Dalcy3 Aug 2024 12:39 UTC

27 points

1 comment1 min readLW link

Why did ChatGPT say that? Prompt engineering and more, with PIZZA.

Jessica Rumbelow3 Aug 2024 12:07 UTC

43 points

2 comments4 min readLW link

Cooperation and Alignment in Delegation Games: You Need Both!

Oliver Sourbut, Lewis Hammond and HarrietW

3 Aug 2024 10:16 UTC

9 points

0 comments14 min readLW link

(www.oliversourbut.net)

SRE’s review of Democracy

Martin Sustrik3 Aug 2024 7:20 UTC

48 points

2 comments3 min readLW link

(250bpm.substack.com)

The Case Against Libertarianism

Zero Contradictions3 Aug 2024 5:05 UTC

−4 points

1 comment1 min readLW link

(zerocontradictions.net)

We Don’t Just Let People Die—So What Next?

James Stephen Brown3 Aug 2024 1:04 UTC

11 points

8 comments10 min readLW link

The EA case for Trump

Kvee3 Aug 2024 1:00 UTC

14 points

1 comment1 min readLW link

(www.secondbest.ca)

I didn’t think I’d take the time to build this calibration training game, but with websim it took roughly 30 seconds, so here it is!

mako yass2 Aug 2024 22:35 UTC

25 points

2 comments5 min readLW link

Evaluating Sparse Autoencoders with Board Game Models

Adam Karvonen, Sam Marks, Can, Benjamin Wright, Jannik Brinkmann, Logan Riggs and Rico Angell

2 Aug 2024 19:50 UTC

38 points

1 comment9 min readLW link

The Bitter Lesson for AI Safety Research

adamk, Richard Ren, Dan H and GMM

2 Aug 2024 18:39 UTC

58 points

5 comments3 min readLW link

Ethical Deception: Should AI Ever Lie?

Jason Reid2 Aug 2024 17:53 UTC

5 points

2 comments7 min readLW link

[Question] Request for AI risk quotes, especially around speed, large impacts and black boxes

Nathan Young2 Aug 2024 17:49 UTC

6 points

0 comments1 min readLW link

A Simple Toy Coherence Theorem

johnswentworth and David Lorell

2 Aug 2024 17:47 UTC

83 points

23 comments7 min readLW link 1 review

All the Following are Distinct

Gianluca Calcagni2 Aug 2024 16:35 UTC

16 points

3 comments10 min readLW link

The ‘strong’ feature hypothesis could be wrong

lewis smith2 Aug 2024 14:33 UTC

236 points

29 comments17 min readLW link 1 review

An information-theoretic study of lying in LLMs

Annah and Guillaume Corlouer

2 Aug 2024 10:06 UTC

17 points

0 comments4 min readLW link

How I Wrought a Lesser Scribing Artifact (You Can, Too!)

Lorxus2 Aug 2024 3:35 UTC

16 points

0 comments5 min readLW link

The Rise and Stagnation of Modernity

Zero Contradictions2 Aug 2024 3:31 UTC

1 point

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)

Lessons from the FDA for AI

Remmelt2 Aug 2024 0:52 UTC

1 point

4 comments1 min readLW link

(ainowinstitute.org)

AI Rights for Human Safety

Simon Goldstein1 Aug 2024 23:01 UTC

55 points

11 comments1 min readLW link

(papers.ssrn.com)

Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders

Gytis Daujotas1 Aug 2024 21:08 UTC

46 points

7 comments7 min readLW link

Optimizing Repeated Correlations

SatvikBeri1 Aug 2024 17:33 UTC

26 points

1 comment1 min readLW link

The need for multi-agent experiments

Martín Soto1 Aug 2024 17:14 UTC

43 points

3 comments9 min readLW link

Dragon Agnosticism

jefftk1 Aug 2024 17:00 UTC

95 points

76 comments2 min readLW link 1 review

(www.jefftk.com)

Morristown ACX Meetup

Matt Brooks1 Aug 2024 16:29 UTC

2 points

1 comment1 min readLW link

Some comments on intelligence

Viliam1 Aug 2024 15:17 UTC

30 points

5 comments3 min readLW link

[Question] [Thought Experiment] Given a button to terminate all humanity, would you press it?

lorepieri1 Aug 2024 15:10 UTC

−2 points

9 comments1 min readLW link

Are unpaid UN internships a good idea?

Cipolla1 Aug 2024 15:06 UTC

1 point

7 comments4 min readLW link

AI #75: Math is Easier

Zvi1 Aug 2024 13:40 UTC

46 points

25 comments72 min readLW link

(thezvi.wordpress.com)

Temporary Cognitive Hyperparameter Alteration

Jonathan Moregård1 Aug 2024 10:27 UTC

10 points

0 comments3 min readLW link

(honestliving.substack.com)

Technology and Progress

Zero Contradictions1 Aug 2024 4:49 UTC

1 point

0 comments1 min readLW link

(thewaywardaxolotl.blogspot.com)