All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 3031

Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution

Kola Ayonrinde30 Oct 2024 22:50 UTC

27 points

0 comments12 min readLW link

Generic advice caveats

Saul Munn30 Oct 2024 21:03 UTC

27 points

1 comment3 min readLW link

(www.brasstacks.blog)

I turned decision theory problems into memes about trolleys

Tapatakt30 Oct 2024 20:13 UTC

104 points

23 comments1 min readLW link

AI as a powerful meme, via CGP Grey

TheManxLoiner30 Oct 2024 18:31 UTC

49 points

8 comments4 min readLW link

[Question] How might language influence how an AI “thinks”?

bodry30 Oct 2024 17:41 UTC

4 points

0 comments1 min readLW link

Motivation control

Joe Carlsmith30 Oct 2024 17:15 UTC

45 points

9 comments52 min readLW link

Updating the NAO Simulator

jefftk30 Oct 2024 13:50 UTC

11 points

0 comments2 min readLW link

(www.jefftk.com)

Occupational Licensing Roundup #1

Zvi30 Oct 2024 11:00 UTC

66 points

11 comments11 min readLW link

(thezvi.wordpress.com)

Three Notions of “Power”

johnswentworth30 Oct 2024 6:10 UTC

97 points

44 comments4 min readLW link

Introduction to Choice set Misspecification in Reward Inference

Rahul Chand29 Oct 2024 22:57 UTC

2 points

0 comments8 min readLW link

Gothenburg LW/ACX meetup

Stefan29 Oct 2024 20:40 UTC

2 points

0 comments1 min readLW link

The Alignment Trap: AI Safety as Path to Power

crispweed29 Oct 2024 15:21 UTC

59 points

17 comments5 min readLW link

(upcoder.com)

Housing Roundup #10

Zvi29 Oct 2024 13:50 UTC

32 points

2 comments32 min readLW link

(thezvi.wordpress.com)

[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations

Steven Byrnes29 Oct 2024 13:36 UTC

61 points

6 comments16 min readLW link

Review: “The Case Against Reality”

David Gross29 Oct 2024 13:13 UTC

24 points

10 comments5 min readLW link

A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More

Sharat Jacob Jacob29 Oct 2024 12:41 UTC

12 points

0 comments9 min readLW link 2 reviews

Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence

EuanMcLean29 Oct 2024 12:16 UTC

47 points

9 comments26 min readLW link

AI #87: Staying in Character

Zvi29 Oct 2024 7:10 UTC

57 points

3 comments33 min readLW link

(thezvi.wordpress.com)

A path to human autonomy

Nathan Helm-Burger29 Oct 2024 3:02 UTC

53 points

16 comments20 min readLW link

D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset

aphyer29 Oct 2024 1:21 UTC

48 points

13 comments6 min readLW link

Gwern: Why So Few Matt Levines?

kave29 Oct 2024 1:07 UTC

78 points

10 comments1 min readLW link

(gwern.net)

October 2024 Progress in Guaranteed Safe AI

Quinn28 Oct 2024 23:34 UTC

7 points

0 comments1 min readLW link

(gsai.substack.com)

5 homegrown EA projects, seeking small donors

Austin Chen28 Oct 2024 23:24 UTC

85 points

4 comments2 min readLW link

How might we solve the alignment problem? (Part 1: Intro, summary, ontology)

Joe Carlsmith28 Oct 2024 21:57 UTC

54 points

5 comments32 min readLW link

Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations

ozziegooen28 Oct 2024 21:44 UTC

7 points

0 comments15 min readLW link

AI & wisdom 3: AI effects on amortised optimisation

L Rudolf L28 Oct 2024 21:08 UTC

18 points

0 comments14 min readLW link

(rudolf.website)

AI & wisdom 2: growth and amortised optimisation

L Rudolf L28 Oct 2024 21:07 UTC

18 points

0 comments8 min readLW link

(rudolf.website)

AI & wisdom 1: wisdom, amortised optimisation, and AI

L Rudolf L28 Oct 2024 21:02 UTC

31 points

0 comments15 min readLW link

(rudolf.website)

Finishing The SB-1047 Documentary In 6 Weeks

Michaël Trazzi28 Oct 2024 20:17 UTC

94 points

7 comments4 min readLW link

(manifund.org)

Towards the Operationalization of Philosophy & Wisdom

Thane Ruthenis28 Oct 2024 19:45 UTC

24 points

2 comments33 min readLW link

(aiimpacts.org)

Quantitative Trading Bootcamp [Nov 6-10]

Ricki Heicklen28 Oct 2024 18:39 UTC

8 points

0 comments1 min readLW link

Winners of the Essay competition on the Automation of Wisdom and Philosophy

owencb and AI Impacts

28 Oct 2024 17:10 UTC

40 points

3 comments30 min readLW link

(blog.aiimpacts.org)

Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority

Zach Stein-Perlman28 Oct 2024 17:00 UTC

22 points

4 comments3 min readLW link

(milesbrundage.substack.com)

[Question] somebody explain the word “epistemic” to me

KvmanThinking28 Oct 2024 16:40 UTC

7 points

8 comments1 min readLW link

~80 Interesting Questions about Foundation Model Agent Safety

RohanS and fidgetsinner

28 Oct 2024 16:37 UTC

48 points

4 comments15 min readLW link

AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels

Corin Katzke, Corin Katzke, Alexa Pan and Dan H

28 Oct 2024 16:03 UTC

6 points

0 comments6 min readLW link

(newsletter.safe.ai)

Death notes − 7 thoughts on death

Nathan Young28 Oct 2024 15:01 UTC

26 points

1 comment5 min readLW link

(nathanpmyoung.substack.com)

SAEs you can See: Applying Sparse Autoencoders to Clustering

Robert_AIZI28 Oct 2024 14:48 UTC

27 points

0 comments10 min readLW link

Bridging the VLM and mech interp communities for multimodal interpretability

Sonia Joseph28 Oct 2024 14:41 UTC

19 points

5 comments15 min readLW link

How Likely Are Various Precursors of Existential Risk?

NunoSempere28 Oct 2024 13:27 UTC

55 points

4 comments15 min readLW link

(blog.sentinel-team.org)

Care Doesn’t Scale

stavros28 Oct 2024 11:57 UTC

27 points

1 comment1 min readLW link

(stevenscrawls.com)

Your memory eventually drives confidence in each hypothesis to 1 or 0

Crazy philosopher28 Oct 2024 9:00 UTC

3 points

6 comments1 min readLW link

Nerdtrition: simple diets via spreadsheet abuse

dkl927 Oct 2024 21:45 UTC

9 points

0 comments3 min readLW link

(dkl9.net)

AGI Fermi Paradox

jrincayc27 Oct 2024 20:14 UTC

−1 points

2 comments2 min readLW link

Substituting Talkbox for Breath Controller

jefftk27 Oct 2024 19:10 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

Open Source Replication of Anthropic’s Crosscoder paper for model-diffing

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

27 Oct 2024 18:46 UTC

48 points

4 comments5 min readLW link

Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)

spencerg27 Oct 2024 17:34 UTC

16 points

0 comments1 min readLW link

Interview with Bill O’Rourke—Russian Corruption, Putin, Applied Ethics, and More

JohnGreer27 Oct 2024 17:11 UTC

2 points

0 comments6 min readLW link

On Shifgrethor

JustisMills27 Oct 2024 15:30 UTC

67 points

18 comments2 min readLW link

(justismills.substack.com)

The hostile telepaths problem

Valentine27 Oct 2024 15:26 UTC

436 points

107 comments15 min readLW link 6 reviews