17 Nov 2024 23:51 UTC

6 points

1 comment7 min readLW link

Which AI Safety Benchmark Do We Need Most in 2025?

Loïc Cabannes and William Ludington

17 Nov 2024 23:50 UTC

2 points

2 comments8 min readLW link

“The Solomonoff Prior is Malign” is a special case of a simpler argument

David Matolcsi17 Nov 2024 21:32 UTC

135 points

46 comments12 min readLW link

Chess As The Model Game

criticalpoints17 Nov 2024 19:45 UTC

19 points

0 comments8 min readLW link

(eregis.github.io)

The grass is always greener in the environment that shaped your values

Karl Faulks17 Nov 2024 18:00 UTC

8 points

0 comments3 min readLW link

Announcing turntrout.com, my new digital home

TurnTrout17 Nov 2024 17:42 UTC

108 points

33 comments1 min readLW link

(turntrout.com)

Secular Solstice Songbook Update

jefftk17 Nov 2024 17:30 UTC

14 points

2 comments1 min readLW link

(www.jefftk.com)

Germany-wide ACX Meetup

Fernand017 Nov 2024 10:08 UTC

4 points

0 comments1 min readLW link

Project Adequate: Seeking Cofounders/Funders

Lorec17 Nov 2024 3:12 UTC

5 points

7 comments8 min readLW link

Trying Bluesky

jefftk17 Nov 2024 2:50 UTC

26 points

16 comments1 min readLW link

(www.jefftk.com)

AXRP Episode 38.1 - Alan Chan on Agent Infrastructure

DanielFilan16 Nov 2024 23:30 UTC

12 points

0 comments14 min readLW link

Cross-context abduction: LLMs make inferences about procedural training data leveraging declarative facts in earlier training data

Sohaib Imran16 Nov 2024 23:22 UTC

36 points

11 comments14 min readLW link

Why We Wouldn’t Build Aligned AI Even If We Could

Snowyiu16 Nov 2024 20:19 UTC

10 points

7 comments10 min readLW link

Which evals resources would be good?

Marius Hobbhahn16 Nov 2024 14:24 UTC

51 points

4 comments5 min readLW link

Private Capabilities, Public Alignment: De-escalating Without Disadvantage

wassname16 Nov 2024 7:26 UTC

6 points

0 comments5 min readLW link

OpenAI Email Archives (from Musk v. Altman and OpenAI blog)

habryka16 Nov 2024 6:38 UTC

548 points

82 comments51 min readLW link

Using Dangerous AI, But Safely?

habryka16 Nov 2024 4:29 UTC

17 points

2 comments43 min readLW link

Ayn Rand’s model of “living money”; and an upside of burnout

AnnaSalamon16 Nov 2024 2:59 UTC

246 points

64 comments5 min readLW link 2 reviews

Fundamental Uncertainty: Epilogue

Gordon Seidoh Worley16 Nov 2024 0:57 UTC

10 points

0 comments1 min readLW link

Making a conservative case for alignment

Cameron Berg, Kvee, phgubbins and Trent Hodgeson

15 Nov 2024 18:55 UTC

208 points

67 comments7 min readLW link

The Case For Giving To The Shrimp Welfare Project

Bentham's Bulldog15 Nov 2024 16:03 UTC

3 points

14 comments7 min readLW link

Win/continue/lose scenarios and execute/replace/audit protocols

Buck15 Nov 2024 15:47 UTC

64 points

3 comments7 min readLW link 1 review

Antonym Heads Predict Semantic Opposites in Language Models

Jake Ward15 Nov 2024 15:32 UTC

3 points

0 comments5 min readLW link

Proposing the Conditional AI Safety Treaty (linkpost TIME)

otto.barten15 Nov 2024 13:59 UTC

11 points

9 comments3 min readLW link

(time.com)

A Theory of Equilibrium in the Offense-Defense Balance

Maxwell Tabarrok15 Nov 2024 13:51 UTC

25 points

6 comments2 min readLW link

(www.maximum-progress.com)

Boston Secular Solstice 2024: Call for Singers and Musicans

jefftk15 Nov 2024 13:50 UTC

22 points

0 comments1 min readLW link

(www.jefftk.com)

An Uncanny Moat

Adam Newgas15 Nov 2024 11:39 UTC

14 points

0 comments4 min readLW link

(www.boristhebrave.com)

If I care about measure, choices have additional burden (+AI generated LW-comments)

avturchin15 Nov 2024 10:27 UTC

5 points

11 comments2 min readLW link

What are Emotions?

Myles H15 Nov 2024 4:20 UTC

5 points

13 comments8 min readLW link

The Third Fundamental Question

Screwtape15 Nov 2024 4:01 UTC

88 points

17 comments6 min readLW link 1 review

Dance Differentiation

jefftk15 Nov 2024 2:30 UTC

14 points

0 comments1 min readLW link

(www.jefftk.com)

Breaking beliefs about saving the world

Oxidize15 Nov 2024 0:46 UTC

−1 points

3 comments9 min readLW link

College technical AI safety hackathon retrospective—Georgia Tech

yix15 Nov 2024 0:22 UTC

44 points

2 comments5 min readLW link

(open.substack.com)

Gwern Branwen interview on Dwarkesh Patel’s podcast: “How an Anonymous Researcher Predicted AI’s Trajectory”

Said Achmiz14 Nov 2024 23:53 UTC

91 points

0 comments1 min readLW link

(www.dwarkeshpatel.com)

Internal music player: phenomenology of earworms

dkl914 Nov 2024 23:29 UTC

6 points

4 comments2 min readLW link

(dkl9.net)

The Foraging (Ex-)Bandit [Ruleset & Reflections]

abstractapplic14 Nov 2024 20:16 UTC

27 points

3 comments2 min readLW link

Seven lessons I didn’t learn from election day

Eric Neyman14 Nov 2024 18:39 UTC

99 points

33 comments13 min readLW link

(ericneyman.wordpress.com)

Effects of Non-Uniform Sparsity on Superposition in Toy Models

Shreyans Jain14 Nov 2024 16:59 UTC

4 points

3 comments6 min readLW link

AI #90: The Wall

Zvi14 Nov 2024 14:10 UTC

32 points

8 comments42 min readLW link

(thezvi.wordpress.com)

Evolutionary prompt optimization for SAE feature visualization

neverix, Daniel Tan, Dmitrii Kharlapenko, Neel Nanda and Arthur Conmy

14 Nov 2024 13:06 UTC

28 points

0 comments9 min readLW link

AXRP Episode 38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent Systems

DanielFilan14 Nov 2024 7:00 UTC

14 points

0 comments12 min readLW link

FrontierMath: A Benchmark for Evaluating Advanced Mathematical Reasoning in AI

Tamay14 Nov 2024 6:13 UTC

33 points

0 comments3 min readLW link

(epoch.ai)

Concrete Methods for Heuristic Estimation on Neural Networks

Oliver Daniels14 Nov 2024 5:07 UTC

35 points

0 comments27 min readLW link

Heresies in the Shadow of the Sequences

Cole Wyeth14 Nov 2024 5:01 UTC

19 points

12 comments2 min readLW link

Thoughts after the Wolfram and Yudkowsky discussion

Tahp14 Nov 2024 1:43 UTC

25 points

13 comments6 min readLW link

Neutrality

sarahconstantin13 Nov 2024 23:10 UTC

162 points

29 comments11 min readLW link 2 reviews

(sarahconstantin.substack.com)

Anvil Shortage

Screwtape13 Nov 2024 22:57 UTC

133 points

19 comments4 min readLW link 3 reviews

[Question] Using hex to get murder advice from GPT-4o

Laurence Freeman13 Nov 2024 18:30 UTC

10 points

5 comments6 min readLW link

Confronting the legion of doom.

Spiritus Dei13 Nov 2024 17:03 UTC

−20 points

3 comments5 min readLW link

Is Deep Learning Actually Hitting a Wall? Evaluating Ilya Sutskever’s Recent Claims

garrison13 Nov 2024 17:00 UTC

84 points

14 comments8 min readLW link

(garrisonlovely.substack.com)