All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 293031

Introduction to Choice set Misspecification in Reward Inference

Rahul Chand29 Oct 2024 22:57 UTC

2 points

0 comments8 min readLW link

Gothenburg LW/ACX meetup

Stefan29 Oct 2024 20:40 UTC

2 points

0 comments1 min readLW link

The Alignment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC

57 points

17 comments5 min readLW link

(upcoder.com)

Housing Roundup #10

Zvi29 Oct 2024 13:50 UTC

32 points

2 comments32 min readLW link

(thezvi.wordpress.com)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC

60 points

2 comments16 min readLW link

Review: “The Case Against Reality”

David Gross29 Oct 2024 13:13 UTC

23 points

10 comments5 min readLW link

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC

12 points

0 comments9 min readLW link

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

EuanMcLean29 Oct 2024 12:16 UTC

45 points

9 comments26 min readLW link

AI #87: Staying in Character

Zvi29 Oct 2024 7:10 UTC

57 points

3 comments33 min readLW link

(thezvi.wordpress.com)

A path to human autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC

53 points

16 comments20 min readLW link

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

aphyer29 Oct 2024 1:21 UTC

48 points

13 comments6 min readLW link

Gwern: Why So Few Matt Levines?

kave29 Oct 2024 1:07 UTC

78 points

10 comments1 min readLW link

(gwern.net)

October 2024 Progress in Guaranteed Safe AI

Quinn28 Oct 2024 23:34 UTC

7 points

0 comments1 min readLW link

(gsai.substack.com)

5 homegrown EA projects, seeking small donors

Austin Chen28 Oct 2024 23:24 UTC

85 points

4 comments2 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe Carlsmith28 Oct 2024 21:57 UTC

54 points

5 comments32 min readLW link

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations

ozziegooen28 Oct 2024 21:44 UTC

7 points

0 comments15 min readLW link

AI & wisdom 3: AI effects on amortised optimisation

L Rudolf L28 Oct 2024 21:08 UTC

18 points

0 comments14 min readLW link

(rudolf.website)

AI & wisdom 2: growth and amortised optimisation

L Rudolf L28 Oct 2024 21:07 UTC

18 points

0 comments8 min readLW link

(rudolf.website)

AI & wisdom 1: wisdom, amortised optimisation, and AI

L Rudolf L28 Oct 2024 21:02 UTC

29 points

0 comments15 min readLW link

(rudolf.website)

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC

94 points

7 comments4 min readLW link

(manifund.org)

Towards the Operationalization of Philosophy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC

20 points

2 comments33 min readLW link

(aiimpacts.org)

Quantitative Trading Bootcamp [Nov 6-10]

Ricki Heicklen28 Oct 2024 18:39 UTC

7 points

0 comments1 min readLW link

Winners of the Essay competition on the Automation of Wisdom and Philosophy

owencb and AI Impacts

28 Oct 2024 17:10 UTC

40 points

3 comments30 min readLW link

(blog.aiimpacts.org)

Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority

Zach Stein-Perlman28 Oct 2024 17:00 UTC

22 points

4 comments3 min readLW link

(milesbrundage.substack.com)

[Question] somebody explain the word “epistemic” to me

KvmanThinking28 Oct 2024 16:40 UTC

7 points

8 comments1 min readLW link

~80 Interesting Questions about Foundation Model Agent Safety

RohanS and Govind Pimpale

28 Oct 2024 16:37 UTC

48 points

4 comments15 min readLW link

AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels

Corin Katzke, Corin Katzke, Alexa Pan and Dan H

28 Oct 2024 16:03 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan Young28 Oct 2024 15:01 UTC

26 points

1 comment5 min readLW link

(nathanpmyoung.substack.com)

SAEs you can See: Applying Sparse Autoencoders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC

27 points

0 comments10 min readLW link

Bridging the VLM and mech interp communities for multimodal interpretability

Sonia Joseph28 Oct 2024 14:41 UTC

19 points

5 comments15 min readLW link

How Likely Are Various Precursors of Existential Risk?

NunoSempere28 Oct 2024 13:27 UTC

55 points

4 comments15 min readLW link

(blog.sentinel-team.org)

Care Doesn’t Scale

stavros28 Oct 2024 11:57 UTC

27 points

1 comment1 min readLW link

(stevenscrawls.com)

Your memory eventually drives confidence in each hypothesis to 1 or 0

Crazy philosopher28 Oct 2024 9:00 UTC

3 points

6 comments1 min readLW link

Nerdtrition: simple diets via spreadsheet abuse

dkl927 Oct 2024 21:45 UTC

9 points

0 comments3 min readLW link

(dkl9.net)

AGI Fermi Paradox

jrincayc27 Oct 2024 20:14 UTC

0 points

2 comments2 min readLW link

Substituting Talkbox for Breath Controller

jefftk27 Oct 2024 19:10 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

27 Oct 2024 18:46 UTC

48 points

4 comments5 min readLW link

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)

spencerg27 Oct 2024 17:34 UTC

16 points

0 comments1 min readLW link

Interview with Bill O’Rourke—Russian Corruption, Putin, Applied Ethics, and More

JohnGreer27 Oct 2024 17:11 UTC

2 points

0 comments6 min readLW link

On Shifgrethor

JustisMills27 Oct 2024 15:30 UTC

67 points

18 comments2 min readLW link

(justismills.substack.com)

The hostile telepaths problem

Valentine27 Oct 2024 15:26 UTC

398 points

92 comments15 min readLW link

[Question] What are some good ways to form opinions on controversial subjects in the current and upcoming era?

Terence Coelho27 Oct 2024 14:33 UTC

9 points

21 comments1 min readLW link

Video lectures on the learning-theoretic agenda

Vanessa Kosoy27 Oct 2024 12:01 UTC

75 points

0 comments1 min readLW link

(www.youtube.com)

Dario Amodei’s “Machines of Loving Grace” sound incredibly dangerous, for Humans

Super AGI27 Oct 2024 5:05 UTC

8 points

1 comment1 min readLW link

Electrostatic Airships?

DaemonicSigil27 Oct 2024 4:32 UTC

64 points

14 comments3 min readLW link

(pbement.com)

A suite of Vision Sparse Autoencoders

Louka Ewington-Pitsos and RRGoyal

27 Oct 2024 4:05 UTC

25 points

0 comments1 min readLW link

Ways to think about alignment

Abhimanyu Pallavi Sudhir27 Oct 2024 1:40 UTC

6 points

0 comments4 min readLW link

[Question] Is there a CFAR handbook audio option?

FinalFormal226 Oct 2024 17:08 UTC

16 points

0 comments1 min readLW link

Retrieval Augmented Genesis II — Holy Texts Semantics Analysis

João Ribeiro Medeiros26 Oct 2024 17:00 UTC

−1 points

0 comments11 min readLW link

A superficially plausible promising alternate Earth without lockstep

Lorec26 Oct 2024 16:04 UTC

−2 points

3 comments4 min readLW link