12 Sep 2024 18:10 UTC

182 points

16 comments7 min readLW link

OpenAI o1

Zach Stein-Perlman12 Sep 2024 17:30 UTC

146 points

41 comments1 min readLW link

How to Give in to Threats (without incentivizing them)

Mikhail Samin12 Sep 2024 15:55 UTC

75 points

33 comments5 min readLW link

Open Problems in AIXI Agent Foundations

Cole Wyeth12 Sep 2024 15:38 UTC

42 points

2 comments10 min readLW link

On the destruction of America’s best high school

Chris_Leong12 Sep 2024 15:30 UTC

−6 points

7 comments1 min readLW link

(scottaaronson.blog)

Optimising under arbitrarily many constraint equations

dkl912 Sep 2024 14:59 UTC

6 points

0 comments3 min readLW link

(dkl9.net)

AI #81: Alpha Proteo

Zvi12 Sep 2024 13:00 UTC

59 points

3 comments35 min readLW link

(thezvi.wordpress.com)

[Question] When can I be numerate?

FinalFormal212 Sep 2024 4:05 UTC

22 points

4 comments1 min readLW link

A Nonconstructive Existence Proof of Aligned Superintelligence

Roko12 Sep 2024 3:20 UTC

0 points

80 comments1 min readLW link

(transhumanaxiology.substack.com)

Collapsing the Belief/Knowledge Distinction

Jeremias11 Sep 2024 21:24 UTC

−7 points

8 comments1 min readLW link

Programming Refusal with Conditional Activation Steering

Bruce W. Lee11 Sep 2024 20:57 UTC

41 points

0 comments11 min readLW link

(brucewlee.com)

Checking public figures on whether they “answered the question” quick analysis from Harris/Trump debate, and a proposal

david reinstein11 Sep 2024 20:25 UTC

8 points

4 comments1 min readLW link

(open.substack.com)

AI Safety Newsletter #41: The Next Generation of Compute Scale Plus, Ranking Models by Susceptibility to Jailbreaking, and Machine Ethics

Corin Katzke, Corin Katzke, Julius, andrewz and Dan H

11 Sep 2024 19:14 UTC

5 points

1 comment5 min readLW link

(newsletter.safe.ai)

Refactoring cryonics as structural brain preservation

Andy_McKenzie11 Sep 2024 18:36 UTC

108 points

14 comments3 min readLW link

[Question] Is this a Pivotal Weak Act? Creating bacteria that decompose metal

doomyeser11 Sep 2024 18:07 UTC

9 points

9 comments3 min readLW link

How to discover the nature of sentience, and ethics

Gustavo Ramires11 Sep 2024 17:22 UTC

4 points

5 comments5 min readLW link

Seeking Mechanism Designer for Research into Internalizing Catastrophic Externalities

c.trout11 Sep 2024 15:09 UTC

24 points

2 comments3 min readLW link

Could Things Be Very Different?—How Historical Inertia Might Blind Us To Optimal Solutions

James Stephen Brown11 Sep 2024 9:53 UTC

5 points

0 comments8 min readLW link

(nonzerosum.games)

Reformative Hypocrisy, and Paying Close Enough Attention to Selectively Reward It.

Andrew_Critch11 Sep 2024 4:41 UTC

53 points

13 comments3 min readLW link

A necessary Membrane formalism feature

ThomasCederborg10 Sep 2024 21:33 UTC

20 points

6 comments11 min readLW link

Formalizing the Informal (event invite)

abramdemski10 Sep 2024 19:22 UTC

42 points

0 comments1 min readLW link

AI #80: Never Have I Ever

Zvi10 Sep 2024 17:50 UTC

46 points

20 comments39 min readLW link

(thezvi.wordpress.com)

The Best Lay Argument is not a Simple English Yud Essay

J Bostock10 Sep 2024 17:34 UTC

275 points

19 comments5 min readLW link 1 review

Economics Roundup #3

Zvi10 Sep 2024 13:50 UTC

44 points

9 comments20 min readLW link

(thezvi.wordpress.com)

Amplify is hiring! Work with us to support field-building initiatives through digital marketing

gergogaspar10 Sep 2024 8:56 UTC

0 points

1 comment4 min readLW link

What bootstraps intelligence?

invertedpassion10 Sep 2024 7:11 UTC

2 points

2 comments1 min readLW link

Physical Therapy Sucks (but have you tried hiding it in some peanut butter?)

Declan Molony10 Sep 2024 5:54 UTC

17 points

12 comments1 min readLW link

Simon DeDeo on Explore vs Exploit in Science

Elizabeth10 Sep 2024 3:40 UTC

20 points

0 comments1 min readLW link

(acesounderglass.com)

Virtue is a Vector

robotelvis10 Sep 2024 3:02 UTC

9 points

1 comment9 min readLW link

(messyprogress.substack.com)

MIT FutureTech are hiring for a Technical Associate role

peterslattery9 Sep 2024 20:16 UTC

3 points

0 comments3 min readLW link

AI forecasting bots incoming

Dan H and Mantas Mazeika

9 Sep 2024 19:14 UTC

24 points

45 comments4 min readLW link

(www.safe.ai)

My takes on SB-1047

leogao9 Sep 2024 18:38 UTC

152 points

9 comments4 min readLW link

[Question] Building an Inexpensive, Aesthetic, Private Forum

Aaron Graifman9 Sep 2024 17:10 UTC

13 points

15 comments1 min readLW link

[Linkpost] Interpretable Analysis of Features Found in Open-source Sparse Autoencoder (partial replication)

Fernando Avalos9 Sep 2024 3:33 UTC

6 points

1 comment1 min readLW link

(forum.effectivealtruism.org)

[Question] Has Anyone Here Consciously Changed Their Passions?

Spade9 Sep 2024 1:36 UTC

11 points

12 comments1 min readLW link

Pollsters Should Publish Question Translations

jefftk8 Sep 2024 22:10 UTC

61 points

3 comments2 min readLW link

(www.jefftk.com)

On Fables and Nuanced Charts

Niko_McCarty8 Sep 2024 17:09 UTC

35 points

2 comments8 min readLW link

(www.asimov.press)

Contra Yudkowsky on 2-4-6 Game Difficulty Explanations

Josh Hickman8 Sep 2024 16:13 UTC

6 points

2 comments2 min readLW link

(xn--2r8hmb.ws)

Attachment THEORY AND THE EFFECTS OF SECURE ATTACHMENT ON CHILD DEVELOPMENT

Mihriban Temel8 Sep 2024 16:09 UTC

−8 points

0 comments9 min readLW link

Fictional parasites very different from our own

Abhishaike Mahajan8 Sep 2024 14:59 UTC

28 points

0 comments4 min readLW link

(www.owlposting.com)

My Number 1 Epistemology Book Recommendation: Inventing Temperature

adamShimi8 Sep 2024 14:30 UTC

125 points

19 comments3 min readLW link 1 review

(epistemologicalfascinations.substack.com)

[Question] I want a good multi-LLM API-powered chatbot

rotatingpaguro8 Sep 2024 9:40 UTC

10 points

5 comments1 min readLW link

That Alien Message—The Animation

Writer7 Sep 2024 14:53 UTC

144 points

10 comments8 min readLW link

(youtu.be)

Jonothan Gorard:The territory is isomorphic to an equivalence class of its maps

Daniel C7 Sep 2024 10:04 UTC

20 points

18 comments2 min readLW link

(x.com)

Pay Risk Evaluators in Cash, Not Equity

Adam Scholl7 Sep 2024 2:37 UTC

226 points

20 comments1 min readLW link 1 review

Excerpts from “A Reader’s Manifesto”

Arjun Panickssery6 Sep 2024 22:37 UTC

72 points

1 comment13 min readLW link

(arjunpanickssery.substack.com)

Fun With CellxGene

sarahconstantin6 Sep 2024 22:00 UTC

30 points

2 comments7 min readLW link

(sarahconstantin.substack.com)

[Question] Is this voting system strategy proof?

Donald Hobson6 Sep 2024 20:44 UTC

17 points

9 comments1 min readLW link

Adam Optimizer Causes Privileged Basis in Transformer LM Residual Stream

Diego Caples and rrenaud

6 Sep 2024 17:55 UTC

74 points

8 comments4 min readLW link

Backdoors as an analogy for deceptive alignment

Jacob_Hilton and Mark Xu

6 Sep 2024 15:30 UTC

104 points

2 comments8 min readLW link

(www.alignment.org)