23 Aug 2024 22:03 UTC

17 points

0 comments25 min readLW link

[Question] What is an appropriate sample size when surveying billions of data points?

Blake23 Aug 2024 21:54 UTC

1 point

2 comments1 min readLW link

Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs

Kola Ayonrinde, Michael Pearce and Lee Sharkey

23 Aug 2024 18:52 UTC

43 points

8 comments16 min readLW link

How I started believing religion might actually matter for rationality and moral philosophy

zhukeepa23 Aug 2024 17:40 UTC

127 points

47 comments7 min readLW link 3 reviews

[Question] What do you expect AI capabilities may look like in 2028?

nonzerosum23 Aug 2024 16:59 UTC

9 points

5 comments1 min readLW link

Invitation to lead a project at AI Safety Camp (Virtual Edition, 2025)

Linda Linsefors, Remmelt Ellen and Robert Kralisch

23 Aug 2024 14:18 UTC

17 points

2 comments4 min readLW link

If we solve alignment, do we die anyway?

Seth Herd23 Aug 2024 13:13 UTC

81 points

130 comments4 min readLW link

What’s going on with Per-Component Weight Updates?

4gate22 Aug 2024 21:22 UTC

1 point

0 comments6 min readLW link

Interoperable High Level Structures: Early Thoughts on Adjectives

johnswentworth and David Lorell

22 Aug 2024 21:12 UTC

55 points

2 comments7 min readLW link

Interest poll: A time-waster blocker for desktop Linux programs

nahoj22 Aug 2024 20:44 UTC

4 points

5 comments1 min readLW link

Turning 22 in the Pre-Apocalypse

testingthewaters22 Aug 2024 20:28 UTC

37 points

14 comments24 min readLW link

(utilityhotbar.github.io)

A Robust Natural Latent Over A Mixed Distribution Is Natural Over The Distributions Which Were Mixed

johnswentworth and David Lorell

22 Aug 2024 19:19 UTC

42 points

4 comments4 min readLW link

A primer on the current state of longevity research

Abhishaike Mahajan22 Aug 2024 17:14 UTC

112 points

8 comments14 min readLW link

(www.owlposting.com)

Some reasons to start a project to stop harmful AI

Remmelt22 Aug 2024 16:23 UTC

5 points

0 comments2 min readLW link

The economics of space tethers

harsimony22 Aug 2024 16:15 UTC

69 points

22 comments7 min readLW link

(splittinginfinity.substack.com)

Dima’s Shortform

Dmitrii Krasheninnikov22 Aug 2024 14:49 UTC

3 points

0 comments1 min readLW link

AI #78: Some Welcome Calm

Zvi22 Aug 2024 14:20 UTC

61 points

15 comments33 min readLW link

(thezvi.wordpress.com)

[Question] How do we know dreams aren’t real?

Logan Zoellner22 Aug 2024 12:41 UTC

5 points

31 comments1 min readLW link

Measuring Structure Development in Algorithmic Transformers

Micurie and Einar Urdshals

22 Aug 2024 8:38 UTC

56 points

4 comments11 min readLW link

Deception and Jailbreak Sequence: 1. Iterative Refinement Stages of Deception in LLMs

Winnie Yang and Jojo Yang

22 Aug 2024 7:32 UTC

23 points

1 comment21 min readLW link

Just because an LLM said it doesn’t mean it’s true: an illustrative example

dirk21 Aug 2024 21:05 UTC

26 points

12 comments3 min readLW link

[Question] How do you finish your tasks faster?

Cipolla21 Aug 2024 20:01 UTC

4 points

2 comments1 min readLW link

AI Safety Newsletter #40: California AI Legislation Plus, NVIDIA Delays Chip Production, and Do AI Safety Benchmarks Actually Measure Safety?

Corin Katzke, Julius, Alexa Pan and Dan H

21 Aug 2024 18:09 UTC

11 points

0 comments6 min readLW link

(newsletter.safe.ai)

[Question] Should LW suggest standard metaprompts?

Dagon21 Aug 2024 16:41 UTC

4 points

6 comments1 min readLW link

Eternal Existence and Eternal Boredom: The Case for AI and Immortal Humans

Tuan Tu Nguyen21 Aug 2024 9:58 UTC

−12 points

2 comments5 min readLW link

Please do not use AI to write for you

Richard_Kennaway21 Aug 2024 9:53 UTC

70 points

35 comments4 min readLW link

Apply to Aether—Independent LLM Agent Safety Research Group

RohanS21 Aug 2024 9:47 UTC

13 points

0 comments7 min readLW link

(forum.effectivealtruism.org)

the Giga Press was a mistake

bhauth21 Aug 2024 4:51 UTC

103 points

26 comments5 min readLW link

(bhauth.com)

Exploring the Boundaries of Cognitohazards and the Nature of Reality

Victor Novikov21 Aug 2024 3:42 UTC

−2 points

2 comments1 min readLW link

[Question] What is the point of 2v2 debates?

Axel Ahlqvist20 Aug 2024 21:59 UTC

2 points

1 comment1 min readLW link

[Question] Where should I look for information on gut health?

FinalFormal220 Aug 2024 19:44 UTC

10 points

10 comments1 min readLW link

Would you benefit from, or object to, a page with LW users’ reacts?

Raemon20 Aug 2024 16:35 UTC

23 points

6 comments1 min readLW link

Freedom of Speech

Zero Contradictions20 Aug 2024 16:34 UTC

−13 points

2 comments2 min readLW link

(thewaywardaxolotl.blogspot.com)

AGI Safety and Alignment at Google DeepMind: A Summary of Recent Work

Rohin Shah, Seb Farquhar and Anca Dragan

20 Aug 2024 16:22 UTC

222 points

33 comments9 min readLW link

Trying to be rational for the wrong reasons

Viliam20 Aug 2024 16:18 UTC

26 points

9 comments3 min readLW link

[Question] How great is the utility of “saving” endangered languages?

SpectrumDT20 Aug 2024 13:14 UTC

18 points

29 comments1 min readLW link

Guide to SB 1047

Zvi20 Aug 2024 13:10 UTC

71 points

18 comments53 min readLW link

(thezvi.wordpress.com)

Finding Deception in Language Models

Esben Kran and Archana Vaidheeswaran

20 Aug 2024 9:42 UTC

20 points

4 comments4 min readLW link

Next automated reasoning grand challenge: CompCert

sanxiyn20 Aug 2024 5:27 UTC

−5 points

0 comments1 min readLW link

Thiel on AI & Racing with China

Ben Pace20 Aug 2024 3:19 UTC

55 points

10 comments12 min readLW link

Reflecting on the transhumanist rebuttal to AI existential risk and critique of our debate methodologies and misuse of statistics

catgirlsruletheworld20 Aug 2024 1:59 UTC

−5 points

0 comments4 min readLW link

Artificial Intelligence and Eternal Torture and Suffering

Tuan Tu Nguyen20 Aug 2024 1:53 UTC

0 points

0 comments4 min readLW link

AI #77: A Few Upgrades

Zvi20 Aug 2024 0:20 UTC

23 points

3 comments52 min readLW link

(thezvi.wordpress.com)

Monthly Roundup #21: August 2024

Zvi20 Aug 2024 0:20 UTC

22 points

6 comments40 min readLW link

(thezvi.wordpress.com)

[Linkpost] Automated Design of Agentic Systems

Bogdan Ionut Cirstea19 Aug 2024 23:06 UTC

8 points

1 comment1 min readLW link

(arxiv.org)

Limitations on Formal Verification for AI Safety

Andrew Dickson19 Aug 2024 23:03 UTC

135 points

60 comments23 min readLW link

The Conscious River: Conscious Turing machines negate materialism

blallo19 Aug 2024 21:54 UTC

0 points

4 comments7 min readLW link

LLM Applications I Want To See

sarahconstantin19 Aug 2024 21:10 UTC

102 points

6 comments8 min readLW link

(sarahconstantin.substack.com)

Defining alignment research

Richard_Ngo19 Aug 2024 20:42 UTC

126 points

26 comments7 min readLW link 2 reviews

Vilnius – ACX Meetups Everywhere Fall 2024

NoUsernameSelected and Mnephisto

19 Aug 2024 17:38 UTC

3 points

1 comment1 min readLW link