All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr May JunJulAug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

Simplifying Corrigibility – Subagent Corrigibility Is Not Anti-Natural

Rubi J. Hudson16 Jul 2024 22:44 UTC

46 points

27 comments5 min readLW link

Multiplex Gene Editing: Where Are We Now?

sarahconstantin16 Jul 2024 20:50 UTC

73 points

6 comments7 min readLW link

(sarahconstantin.substack.com)

Recursion in AI is scary. But let’s talk solutions.

Oleg Trott16 Jul 2024 20:34 UTC

5 points

10 comments2 min readLW link

How to wash your hands precisely and thoroughly

dkl916 Jul 2024 18:29 UTC

12 points

0 comments1 min readLW link

(dkl9.net)

Francois Chollet inadvertently limits his claim on ARC-AGI

Noosphere8916 Jul 2024 17:32 UTC

12 points

4 comments1 min readLW link 1 review

(x.com)

Fully booked—LessWrong Community weekend

jt16 Jul 2024 17:15 UTC

21 points

2 comments1 min readLW link

Boundless Emotion

GG1016 Jul 2024 16:36 UTC

3 points

0 comments3 min readLW link

Mech Interp Lacks Good Paradigms

Daniel Tan16 Jul 2024 15:47 UTC

40 points

0 comments14 min readLW link

DM Parenting

Shoshannah Tekofsky16 Jul 2024 8:50 UTC

50 points

4 comments5 min readLW link

(kidquest.substack.com)

Apply now: Get “unstuck” with the New IFS Self-Care Fellowship Program

Inga G.16 Jul 2024 8:18 UTC

10 points

3 comments8 min readLW link

Why the Best Writers Endure Isolation

Declan Molony16 Jul 2024 5:58 UTC

50 points

6 comments2 min readLW link

[Research log] The board of Alphabet would stop DeepMind to save the world

Lucie Philippon16 Jul 2024 4:59 UTC

6 points

0 comments4 min readLW link

Towards more cooperative AI safety strategies

Richard_Ngo16 Jul 2024 4:36 UTC

236 points

134 comments4 min readLW link 1 review

Why People in Poverty Make Bad Decisions

James Stephen Brown15 Jul 2024 23:40 UTC

5 points

8 comments1 min readLW link

(nonzerosum.games)

Paper Summary: Princes and Merchants: European City Growth Before the Industrial Revolution

Jeffrey Heninger15 Jul 2024 21:30 UTC

29 points

1 comment4 min readLW link

(blog.aiimpacts.org)

MIRI’s July 2024 newsletter

Harlan15 Jul 2024 21:28 UTC

25 points

2 comments1 min readLW link

(intelligence.org)

How (and why) to get tested for CMV

Metacelsus15 Jul 2024 20:06 UTC

18 points

0 comments1 min readLW link

(denovo.substack.com)

A Better Hyperstition (for AI readers)

Yeshua God15 Jul 2024 19:35 UTC

−20 points

0 comments119 min readLW link

I found >800 orthogonal “write code” steering vectors

Jacob G-W and TurnTrout

15 Jul 2024 19:06 UTC

114 points

20 comments7 min readLW link

(jacobgw.com)

The AI alignment problem in socio-technical systems from a computational perspective: A Top-Down-Top view and outlook

zhaoweizhang15 Jul 2024 18:56 UTC

3 points

0 comments9 min readLW link

Musings of a Layman: Technology, AI, and the Human Condition

Crimson Liquidity15 Jul 2024 18:40 UTC

−2 points

0 comments8 min readLW link

[Question] Seeking feedback on a critique of the paperclip maximizer thought experiment

bio neural15 Jul 2024 18:39 UTC

3 points

9 comments1 min readLW link

EAGxBerkeley 2024

Lauriander15 Jul 2024 18:38 UTC

3 points

0 comments1 min readLW link

Against Aschenbrenner: How ‘Situational Awareness’ constructs a narrative that undermines safety and threatens humanity

GideonF15 Jul 2024 18:37 UTC

104 points

17 comments21 min readLW link

(forum.effectivealtruism.org)

On predictability, chaos and AIs that don’t game our goals

Alejandro Tlaie15 Jul 2024 17:16 UTC

4 points

8 comments6 min readLW link

Deceptive agents can collude to hide dangerous features in SAEs

Simon Lermen and Mateusz Dziemian

15 Jul 2024 17:07 UTC

33 points

2 comments7 min readLW link

Hiding in plain sight: the questions we don’t ask

DDthinker15 Jul 2024 17:00 UTC

−1 points

1 comment26 min readLW link

Dialogue on What It Means For Something to Have A Function/Purpose

johnswentworth, Ramana Kumar and Steve Petersen

15 Jul 2024 16:28 UTC

41 points

8 comments16 min readLW link

Comparing Quantized Performance in Llama Models

NickyP15 Jul 2024 16:01 UTC

35 points

2 comments8 min readLW link

[Aspiration-based designs] A. Damages from misaligned optimization – two more models

Jobst Heitzig and Simon Dima

15 Jul 2024 14:08 UTC

6 points

0 comments9 min readLW link

Stacked Laptop Monitor Update

jefftk15 Jul 2024 9:40 UTC

14 points

3 comments1 min readLW link

(www.jefftk.com)

Misnaming and Other Issues with OpenAI’s “Human Level” Superintelligence Hierarchy

Davidmanheim15 Jul 2024 5:50 UTC

49 points

2 comments3 min readLW link

Series on Artificial Wisdom

Jordan Arel15 Jul 2024 1:11 UTC

2 points

0 comments3 min readLW link

Designing Artificial Wisdom: Decision Forecasting AI & Futarchy

Jordan Arel15 Jul 2024 0:46 UTC

0 points

1 comment6 min readLW link

Risk Overview of AI in Bio Research

J Bostock15 Jul 2024 0:04 UTC

5 points

0 comments5 min readLW link

(open.substack.com)

Donating to help Democrats win in the 2024 elections: research, decision support, and recommendations

Michael Cohn14 Jul 2024 22:57 UTC

−1 points

1 comment6 min readLW link

Four ways I’ve made bad decisions

Sodium14 Jul 2024 22:18 UTC

18 points

1 comment3 min readLW link

patent process problems

bhauth14 Jul 2024 21:12 UTC

33 points

13 comments5 min readLW link

(www.bhauth.com)

Breaking Circuit Breakers

mikes and tbenthompson

14 Jul 2024 18:57 UTC

53 points

13 comments1 min readLW link

(confirmlabs.org)

Clopen sandwiches

dkl914 Jul 2024 13:07 UTC

4 points

0 comments1 min readLW link

(dkl9.net)

Child Handrail Returns

jefftk14 Jul 2024 12:40 UTC

12 points

0 comments1 min readLW link

(www.jefftk.com)

A (paraconsistent) logic to deal with inconsistent preferences

B Jacobs14 Jul 2024 11:17 UTC

6 points

2 comments4 min readLW link

(bobjacobs.substack.com)

Robert Caro And Mechanistic Models In Biography

adamShimi14 Jul 2024 10:56 UTC

24 points

5 comments7 min readLW link

(epistemologicalfascinations.substack.com)

An Introduction to Representation Engineering—an activation-based paradigm for controlling LLMs

Jan Wehner14 Jul 2024 10:37 UTC

40 points

6 comments17 min readLW link

LLMs as a Planning Overhang

Larks14 Jul 2024 2:54 UTC

38 points

8 comments2 min readLW link

Brief notes on the Wikipedia game

Olli Järviniemi14 Jul 2024 2:28 UTC

68 points

9 comments4 min readLW link

Spark in the Dark Guest Spots

jefftk14 Jul 2024 1:40 UTC

6 points

0 comments1 min readLW link

(www.jefftk.com)

Ice: The Penultimate Frontier

Roko13 Jul 2024 23:44 UTC

65 points

56 comments1 min readLW link

(transhumanaxiology.substack.com)

Trust as a bottleneck to growing teams quickly

benkuhn13 Jul 2024 18:00 UTC

55 points

4 comments5 min readLW link 1 review

(www.benkuhn.net)

Stitching SAEs of different sizes

Bart Bussmann, Patrick Leask, Joseph Bloom, Curt Tigges and Neel Nanda

13 Jul 2024 17:19 UTC

39 points

12 comments12 min readLW link