All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 789 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

1. The CAST Strategy

Max Harms7 Jun 2024 22:29 UTC

58 points

27 comments38 min readLW link

0. CAST: Corrigibility as Singular Target

Max Harms7 Jun 2024 22:29 UTC

163 points

24 comments9 min readLW link 2 reviews

What is space? What is time?

Tahp7 Jun 2024 22:15 UTC

8 points

3 comments7 min readLW link

[Question] Question about Lewis’ counterfactual theory of causation

jbkjr7 Jun 2024 20:15 UTC

12 points

7 comments1 min readLW link

Relationships among words, metalingual definition, and interpretability

Bill Benzon7 Jun 2024 19:18 UTC

2 points

0 comments5 min readLW link

Let’s Talk About Emergence

jacobhaimes7 Jun 2024 19:18 UTC

4 points

0 comments7 min readLW link

(www.odysseaninstitute.org)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues

aphyer7 Jun 2024 19:02 UTC

43 points

18 comments3 min readLW link 2 reviews

Natural Latents Are Not Robust To Tiny Mixtures

johnswentworth and David Lorell

7 Jun 2024 18:53 UTC

65 points

8 comments5 min readLW link

Situational Awareness Summarized—Part 2

Joe Rogero7 Jun 2024 17:20 UTC

12 points

2 comments4 min readLW link

Frida van Lisa, a short story about adversarial AI attacks on humans

arisAlexis7 Jun 2024 13:22 UTC

2 points

0 comments18 min readLW link

Quotes from Leopold Aschenbrenner’s Situational Awareness Paper

Zvi7 Jun 2024 11:40 UTC

97 points

10 comments37 min readLW link

(thezvi.wordpress.com)

LessWrong/ACX meetup Transilvanya tour—Cluj Napoca

Marius Adrian Nicoară7 Jun 2024 5:45 UTC

1 point

1 comment1 min readLW link

Is Claude a mystic?

jessicata7 Jun 2024 4:27 UTC

60 points

23 comments13 min readLW link

(unstablerontology.substack.com)

Offering Completion

jefftk7 Jun 2024 1:40 UTC

32 points

6 comments1 min readLW link

(www.jefftk.com)

A Case for Superhuman Governance, using AI

ozziegooen7 Jun 2024 0:10 UTC

31 points

0 comments10 min readLW link

Memorizing weak examples can elicit strong behavior out of password-locked models

Fabien Roger and ryan_greenblatt

6 Jun 2024 23:54 UTC

61 points

5 comments7 min readLW link

Response to Aschenbrenner’s “Situational Awareness”

Rob Bensinger6 Jun 2024 22:57 UTC

197 points

27 comments3 min readLW link

Scaling and evaluating sparse autoencoders

leogao6 Jun 2024 22:50 UTC

112 points

6 comments1 min readLW link

Humming is not a free $100 bill

Elizabeth6 Jun 2024 20:10 UTC

192 points

7 comments3 min readLW link 1 review

(acesounderglass.com)

There Are No Primordial Definitions of Man/Woman

ymeskhout6 Jun 2024 19:30 UTC

11 points

0 comments4 min readLW link

(ymeskhout.substack.com)

Situational Awareness Summarized—Part 1

Joe Rogero6 Jun 2024 18:59 UTC

21 points

0 comments5 min readLW link

[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”

David Scott Krueger6 Jun 2024 18:55 UTC

70 points

2 comments6 min readLW link

(llm-safety-challenges.github.io)

AI #67: Brief Strange Trip

Zvi6 Jun 2024 18:50 UTC

49 points

6 comments40 min readLW link

(thezvi.wordpress.com)

The Human Biological Advantage Over AI

Wstewart6 Jun 2024 18:18 UTC

−13 points

2 comments1 min readLW link

An evaluation of Helen Toner’s interview on the TED AI Show

peter_hartree6 Jun 2024 17:39 UTC

24 points

2 comments30 min readLW link

The Impossibility of a Rational Intelligence Optimizer

Nicolas Villarreal6 Jun 2024 16:14 UTC

−9 points

5 comments14 min readLW link

Immunization against harmful fine-tuning attacks

domenicrosati, Jan Wehner and David Atanasov

6 Jun 2024 15:17 UTC

4 points

0 comments12 min readLW link

SB 1047 Is Weakened

Zvi6 Jun 2024 13:40 UTC

67 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Weeping Agents

Ouro6 Jun 2024 12:18 UTC

27 points

2 comments3 min readLW link

Podcast: Center for AI Policy, on AI risk and listening to AI researchers

KatjaGrace6 Jun 2024 3:30 UTC

9 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Calculating Natural Latents via Resampling

johnswentworth and David Lorell

6 Jun 2024 0:37 UTC

55 points

4 comments10 min readLW link

SAEs Discover Meaningful Features in the IOI Task

Alex Makelov, Georg Lange and Neel Nanda

5 Jun 2024 23:48 UTC

15 points

2 comments10 min readLW link

Let’s Design A School, Part 2.4 School as Education—The Curriculum (Phase 3, Specific)

Sable5 Jun 2024 21:40 UTC

19 points

2 comments12 min readLW link

(affablyevil.substack.com)

METR is hiring ML Research Engineers and Scientists

Xodarap5 Jun 2024 21:27 UTC

5 points

0 comments1 min readLW link

(metr.org)

Book review: The Quincunx

cousin_it5 Jun 2024 21:13 UTC

52 points

12 comments2 min readLW link

[Question] How should I think about my career?

Chico5 Jun 2024 18:11 UTC

3 points

2 comments1 min readLW link

AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks

Corin Katzke, Julius and Dan H

5 Jun 2024 17:45 UTC

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

GPT2, Five Years On

Joel Burget5 Jun 2024 17:44 UTC

34 points

0 comments3 min readLW link

(importai.substack.com)

[Question] Who wants to be invited to the LW Metamodern dialogue?

hunterglenn5 Jun 2024 16:39 UTC

−3 points

1 comment1 min readLW link

Nonreactivity: a simple model of meditation

cesiumquail5 Jun 2024 16:26 UTC

21 points

4 comments6 min readLW link

graphpatch: a Python Library for Activation Patching

Evan Lloyd5 Jun 2024 15:08 UTC

16 points

2 comments1 min readLW link

Startup Stock Options: the Shortest Complete Guide for Employees

Boris T5 Jun 2024 15:03 UTC

18 points

3 comments1 min readLW link

(borisagain.substack.com)

Aggregative Principles of Social Justice

Cleo Nardo5 Jun 2024 13:44 UTC

29 points

10 comments37 min readLW link

What and how much makes a difference?

Marius Adrian Nicoară5 Jun 2024 10:30 UTC

7 points

0 comments2 min readLW link

Announcing ILIAD — Theoretical AI Alignment Conference

Nora_Ammann and Alexander Gietelink Oldenziel

5 Jun 2024 9:37 UTC

163 points

18 comments2 min readLW link

Second-Order Rationality, System Rationality, and a feature suggestion for LessWrong

Mati_Roy5 Jun 2024 7:20 UTC

13 points

2 comments8 min readLW link

Former OpenAI Superalignment Researcher: Superintelligence by 2030

Julian Bradshaw5 Jun 2024 3:35 UTC

70 points

30 comments1 min readLW link

(situational-awareness.ai)

On “first critical tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC

55 points

8 comments14 min readLW link

Takeoff speeds presentation at Anthropic

Tom Davidson4 Jun 2024 22:46 UTC

93 points

0 comments25 min readLW link

A Reflection on Richard Hamming’s “You and Your Research”: Striving for Greatness

aysajan4 Jun 2024 20:07 UTC

9 points

5 comments21 min readLW link

(www.aysajaneziz.com)