6 Jun 2024 23:54 UTC

61 points

5 comments7 min readLW link

Response to Aschenbrenner’s “Situational Awareness”

Rob Bensinger6 Jun 2024 22:57 UTC

197 points

27 comments3 min readLW link

Scaling and evaluating sparse autoencoders

leogao6 Jun 2024 22:50 UTC

112 points

6 comments1 min readLW link

Humming is not a free $100 bill

Elizabeth6 Jun 2024 20:10 UTC

192 points

7 comments3 min readLW link 1 review

(acesounderglass.com)

There Are No Primordial Definitions of Man/Woman

ymeskhout6 Jun 2024 19:30 UTC

11 points

0 comments4 min readLW link

(ymeskhout.substack.com)

Situational Awareness Summarized—Part 1

Joe Rogero6 Jun 2024 18:59 UTC

21 points

0 comments5 min readLW link

[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”

David Scott Krueger6 Jun 2024 18:55 UTC

70 points

2 comments6 min readLW link

(llm-safety-challenges.github.io)

AI #67: Brief Strange Trip

Zvi6 Jun 2024 18:50 UTC

49 points

6 comments40 min readLW link

(thezvi.wordpress.com)

The Human Biological Advantage Over AI

Wstewart6 Jun 2024 18:18 UTC

−13 points

2 comments1 min readLW link

An evaluation of Helen Toner’s interview on the TED AI Show

peter_hartree6 Jun 2024 17:39 UTC

24 points

2 comments30 min readLW link

The Impossibility of a Rational Intelligence Optimizer

Nicolas Villarreal6 Jun 2024 16:14 UTC

−9 points

5 comments14 min readLW link

Immunization against harmful fine-tuning attacks

domenicrosati, Jan Wehner and David Atanasov

6 Jun 2024 15:17 UTC

4 points

0 comments12 min readLW link

SB 1047 Is Weakened

Zvi6 Jun 2024 13:40 UTC

67 points

4 comments9 min readLW link

(thezvi.wordpress.com)

Weeping Agents

Ouro6 Jun 2024 12:18 UTC

27 points

2 comments3 min readLW link

Podcast: Center for AI Policy, on AI risk and listening to AI researchers

KatjaGrace6 Jun 2024 3:30 UTC

9 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Calculating Natural Latents via Resampling

johnswentworth and David Lorell

6 Jun 2024 0:37 UTC

55 points

4 comments10 min readLW link

SAEs Discover Meaningful Features in the IOI Task

Alex Makelov, Georg Lange and Neel Nanda

5 Jun 2024 23:48 UTC

15 points

2 comments10 min readLW link

Let’s Design A School, Part 2.4 School as Education—The Curriculum (Phase 3, Specific)

Sable5 Jun 2024 21:40 UTC

19 points

2 comments12 min readLW link

(affablyevil.substack.com)

METR is hiring ML Research Engineers and Scientists

Xodarap5 Jun 2024 21:27 UTC

5 points

0 comments1 min readLW link

(metr.org)

Book review: The Quincunx

cousin_it5 Jun 2024 21:13 UTC

52 points

12 comments2 min readLW link

[Question] How should I think about my career?

Chico5 Jun 2024 18:11 UTC

3 points

2 comments1 min readLW link

AISN #36: Voluntary Commitments are Insufficient Plus, a Senate AI Policy Roadmap, and Chapter 1: An Overview of Catastrophic Risks

Corin Katzke, Julius and Dan H

5 Jun 2024 17:45 UTC

9 points

0 comments5 min readLW link

(newsletter.safe.ai)

GPT2, Five Years On

Joel Burget5 Jun 2024 17:44 UTC

34 points

0 comments3 min readLW link

(importai.substack.com)

[Question] Who wants to be invited to the LW Metamodern dialogue?

hunterglenn5 Jun 2024 16:39 UTC

−3 points

1 comment1 min readLW link

Nonreactivity: a simple model of meditation

cesiumquail5 Jun 2024 16:26 UTC

21 points

4 comments6 min readLW link

graphpatch: a Python Library for Activation Patching

Evan Lloyd5 Jun 2024 15:08 UTC

16 points

2 comments1 min readLW link

Startup Stock Options: the Shortest Complete Guide for Employees

Boris T5 Jun 2024 15:03 UTC

18 points

3 comments1 min readLW link

(borisagain.substack.com)

Aggregative Principles of Social Justice

Cleo Nardo5 Jun 2024 13:44 UTC

29 points

10 comments37 min readLW link

What and how much makes a difference?

Marius Adrian Nicoară5 Jun 2024 10:30 UTC

7 points

0 comments2 min readLW link

Announcing ILIAD — Theoretical AI Alignment Conference

Nora_Ammann and Alexander Gietelink Oldenziel

5 Jun 2024 9:37 UTC

163 points

18 comments2 min readLW link

Second-Order Rationality, System Rationality, and a feature suggestion for LessWrong

Mati_Roy5 Jun 2024 7:20 UTC

13 points

2 comments8 min readLW link

Former OpenAI Superalignment Researcher: Superintelligence by 2030

Julian Bradshaw5 Jun 2024 3:35 UTC

70 points

30 comments1 min readLW link

(situational-awareness.ai)

On “first critical tries” in AI alignment

Joe Carlsmith5 Jun 2024 0:19 UTC

55 points

8 comments14 min readLW link

Takeoff speeds presentation at Anthropic

Tom Davidson4 Jun 2024 22:46 UTC

93 points

0 comments25 min readLW link

A Reflection on Richard Hamming’s “You and Your Research”: Striving for Greatness

aysajan4 Jun 2024 20:07 UTC

9 points

5 comments21 min readLW link

(www.aysajaneziz.com)

A Semiotic Critique of the Orthogonality Thesis

Nicolas Villarreal4 Jun 2024 18:52 UTC

3 points

10 comments15 min readLW link

Here’s Why Indefinite Life Extension Will Never Work, Even Though it Does.

HomingHamster4 Jun 2024 18:48 UTC

−13 points

5 comments18 min readLW link

Ideas for Next-Generation Writing Platforms, using LLMs

ozziegooen4 Jun 2024 18:40 UTC

26 points

4 comments2 min readLW link

Evidence of Learned Look-Ahead in a Chess-Playing Neural Network

Erik Jenner4 Jun 2024 15:50 UTC

121 points

14 comments13 min readLW link

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.

Josh Levy4 Jun 2024 15:45 UTC

43 points

0 comments18 min readLW link

[Paper] Stress-testing capability elicitation with password-locked models

Fabien Roger and ryan_greenblatt

4 Jun 2024 14:52 UTC

89 points

10 comments12 min readLW link

(arxiv.org)

Circuit Board Ordering

jefftk4 Jun 2024 14:00 UTC

10 points

0 comments1 min readLW link

(www.jefftk.com)

[Question] Has anyone here written about religious fictionalism?

SpectrumDT4 Jun 2024 12:10 UTC

0 points

4 comments1 min readLW link

Is Wittgenstein’s Language Game used when helping Ai understand language?

VisionaryHera4 Jun 2024 7:41 UTC

4 points

7 comments1 min readLW link

Smartphone Etiquette: Suggestions for Social Interactions

Declan Molony4 Jun 2024 6:01 UTC

29 points

4 comments3 min readLW link

Just admit that you’ve zoned out

joec4 Jun 2024 2:51 UTC

94 points

22 comments2 min readLW link

(Not) Derailing the LessOnline Puzzle Hunt

Error4 Jun 2024 1:28 UTC

74 points

2 comments4 min readLW link

Masculinity—A Case For Courage

James Stephen Brown4 Jun 2024 0:04 UTC

24 points

0 comments7 min readLW link

(nonzerosum.games)

Philosophers wrestling with evil, as a social media feed

David Gross3 Jun 2024 22:25 UTC

73 points

3 comments16 min readLW link

ACI#8: Value as a Function of Possible Worlds

Akira Pyinya3 Jun 2024 21:49 UTC

6 points

2 comments7 min readLW link