All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 171819 20 21 22 23 24 25 26 27 28 29 30

1. A Sense of Fairness: Deconfusing Ethics

RogerDearnaley17 Nov 2023 20:55 UTC

19 points

10 comments15 min readLW link

Sam Altman fired from OpenAI

LawrenceC17 Nov 2023 20:42 UTC

192 points

75 comments1 min readLW link

(openai.com)

On the lethality of biased human reward ratings

Eli Tyre and johnswentworth

17 Nov 2023 18:59 UTC

48 points

10 comments37 min readLW link

Coup probes: Catching catastrophes with probes trained off-policy

Fabien Roger17 Nov 2023 17:58 UTC

95 points

9 comments11 min readLW link 1 review

On Lies and Liars

Gabriel Alfour17 Nov 2023 17:13 UTC

31 points

4 comments14 min readLW link

(cognition.cafe)

Classifying representations of sparse autoencoders (SAEs)

Annah17 Nov 2023 13:54 UTC

15 points

6 comments2 min readLW link

R&D is a Huge Externality, So Why Do Markets Do So Much of it?

Maxwell Tabarrok17 Nov 2023 13:14 UTC

15 points

14 comments3 min readLW link

(maximumprogress.substack.com)

On excluding dangerous information from training

ShayBenMoshe17 Nov 2023 11:14 UTC

23 points

5 comments3 min readLW link

The dangers of reproducing while old

garymm17 Nov 2023 5:55 UTC

23 points

6 comments1 min readLW link

(www.garymm.org)

I put odds on ends with Nathan Young

KatjaGrace17 Nov 2023 5:40 UTC

8 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Debate helps supervise human experts [Paper]

habryka17 Nov 2023 5:25 UTC

29 points

6 comments1 min readLW link

(github.com)

A to Z of things

KatjaGrace17 Nov 2023 5:20 UTC

71 points

8 comments1 min readLW link 1 review

(worldspiritsockpuppet.com)

On Tapping Out

Screwtape17 Nov 2023 3:23 UTC

52 points

14 comments8 min readLW link 1 review

Eliciting Latent Knowledge in Comprehensive AI Services Models

acabodi17 Nov 2023 2:36 UTC

6 points

0 comments5 min readLW link

Some Rules for an Algebra of Bayes Nets

johnswentworth and David Lorell

16 Nov 2023 23:53 UTC

101 points

48 comments14 min readLW link 1 review

How much to update on recent AI governance moves?

habryka and So8res

16 Nov 2023 23:46 UTC

112 points

5 comments29 min readLW link

New LessWrong feature: Dialogue Matching

Bird Concept16 Nov 2023 21:27 UTC

107 points

22 comments3 min readLW link

Towards Evaluating AI Systems for Moral Status Using Self-Reports

Ethan Perez and Robbo

16 Nov 2023 20:18 UTC

45 points

3 comments1 min readLW link

(arxiv.org)

Social Dark Matter

Duncan Sabien (Inactive)16 Nov 2023 20:00 UTC

388 points

131 comments34 min readLW link 2 reviews

AI #38: Let’s Make a Deal

Zvi16 Nov 2023 19:50 UTC

44 points

2 comments55 min readLW link

(thezvi.wordpress.com)

Forecasting AI (Overview)

jsteinhardt16 Nov 2023 19:00 UTC

35 points

0 comments2 min readLW link

(bounded-regret.ghost.io)

We Should Talk About This More. Epistemic World Collapse as Imminent Safety Risk of Generative AI.

Joerg Weiss16 Nov 2023 18:46 UTC

11 points

2 comments29 min readLW link

Intelligence in systems (human, AI) can be conceptualized as the resolution and throughput at which a system can process and affect Shannon information.

AiresJL16 Nov 2023 17:46 UTC

0 points

0 comments2 min readLW link

Life on the Grid (Part 2)

rogersbacon16 Nov 2023 17:22 UTC

7 points

0 comments15 min readLW link

(www.secretorum.life)

The impossibility of rationally analyzing partisan news

RationalDino16 Nov 2023 16:19 UTC

4 points

4 comments1 min readLW link

We are Peacecraft.ai!

MadHatter16 Nov 2023 14:15 UTC

15 points

20 comments2 min readLW link

A dialectical view of the history of AI, Part 1: We’re only in the antithesis phase. [A synthesis is in the future.]

Bill Benzon16 Nov 2023 12:34 UTC

6 points

0 comments12 min readLW link

[Question] How much fraud is there in academia?

ChristianKl16 Nov 2023 11:50 UTC

23 points

10 comments1 min readLW link

Learning coefficient estimation: the details

Zach Furman16 Nov 2023 3:19 UTC

37 points

0 comments2 min readLW link

(colab.research.google.com)

[Question] AI Safety orgs- what’s your biggest bottleneck right now?

Kabir Kumar16 Nov 2023 2:02 UTC

1 point

0 comments1 min readLW link

My critique of Eliezer’s deeply irrational beliefs

Jorterder16 Nov 2023 0:34 UTC

−35 points

1 comment9 min readLW link

(docs.google.com)

Extrapolating from Five Words

Gordon Seidoh Worley15 Nov 2023 23:21 UTC

40 points

11 comments2 min readLW link

In Defense of Parselmouths

Screwtape15 Nov 2023 23:02 UTC

56 points

12 comments10 min readLW link 1 review

Life on the Grid (Part 1)

rogersbacon15 Nov 2023 22:37 UTC

12 points

4 comments9 min readLW link

(www.secretorum.life)

Glomarization FAQ

Zane15 Nov 2023 20:20 UTC

39 points

5 comments5 min readLW link

Testbed evals: evaluating AI safety even when it can’t be directly measured

joshc15 Nov 2023 19:00 UTC

72 points

2 comments4 min readLW link

EA/ACX/LW November Santa Cruz Meetup

madmail15 Nov 2023 18:39 UTC

1 point

0 comments1 min readLW link

New report: “Scheming AIs: Will AIs fake alignment during training in order to get power?”

Joe Carlsmith15 Nov 2023 17:16 UTC

83 points

28 comments30 min readLW link 1 review

Large Language Models can Strategically Deceive their Users when Put Under Pressure.

ReaderM15 Nov 2023 16:36 UTC

90 points

9 comments2 min readLW link 1 review

(arxiv.org)

AISN #26: National Institutions for AI Safety, Results From the UK Summit, and New Releases From OpenAI and xAI

Corin Katzke, allison huang and Dan H

15 Nov 2023 16:07 UTC

13 points

0 comments6 min readLW link

(newsletter.safe.ai)

‘Theories of Values’ and ‘Theories of Agents’: confusions, musings and desiderata

Mateusz Bagiński and Nora_Ammann

15 Nov 2023 16:00 UTC

35 points

8 comments24 min readLW link

Experiences and learnings from both sides of the AI safety job market

Marius Hobbhahn15 Nov 2023 15:40 UTC

111 points

4 comments18 min readLW link

A conceptual precursor to today’s language machines [Shannon]

Bill Benzon15 Nov 2023 13:50 UTC

24 points

6 comments2 min readLW link

[Question] Should Advanced Placement High School classes discuss Israel-Palestine? If so, how? If not, why? Who should make this decision?

Gesild Muka15 Nov 2023 4:50 UTC

−1 points

5 comments1 min readLW link

Reinforcement Via Giving People Cookies

Screwtape15 Nov 2023 4:34 UTC

70 points

9 comments6 min readLW link

Incidental polysemanticity

Victor Lecomte, Kushal Thaman, tmychow and Rylan Schaeffer

15 Nov 2023 4:00 UTC

43 points

7 comments11 min readLW link

LLMs May Find It Hard to FOOM

RogerDearnaley15 Nov 2023 2:52 UTC

13 points

30 comments12 min readLW link

Linearity Fallacies

hippo15 Nov 2023 2:23 UTC

15 points

0 comments5 min readLW link

SIA Is Just Being a Bayesian About the Fact That One Exists

Bentham's Bulldog14 Nov 2023 22:55 UTC

3 points

5 comments4 min readLW link

AI Alignment [progress] this Week (11/12/2023)

Logan Zoellner14 Nov 2023 22:21 UTC

6 points

0 comments2 min readLW link

(midwitalignment.substack.com)