All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 232425 26 27 28

Teleosemantics!

abramdemski23 Feb 2023 23:26 UTC

82 points

27 comments6 min readLW link 1 review

AI that shouldn’t work, yet kind of does

Donald Hobson23 Feb 2023 23:18 UTC

27 points

8 comments3 min readLW link

The AGI Optimist’s Dilemma

kaputmi23 Feb 2023 20:20 UTC

−6 points

1 comment1 min readLW link

Searching for a model’s concepts by their shape – a theoretical framework

Kaarel, Georgios Kaklamanos, Walter Laurito , Kay Kozaronek, AlexMennen and June Ku

23 Feb 2023 20:14 UTC

51 points

0 comments19 min readLW link

Why I’m Skeptical of De-Extinction

Niko_McCarty23 Feb 2023 19:42 UTC

17 points

1 comment11 min readLW link

(cell.substack.com)

[Question] What causes randomness?

lotsofquestions23 Feb 2023 18:50 UTC

1 point

12 comments1 min readLW link

Somerville Roads Getting More Dangerous?

jefftk23 Feb 2023 18:20 UTC

15 points

1 comment1 min readLW link

(www.jefftk.com)

EIS XII: Summary

scasper23 Feb 2023 17:45 UTC

19 points

0 comments6 min readLW link

How to survive in an AGI cataclysm

RomanS23 Feb 2023 14:34 UTC

−4 points

3 comments4 min readLW link

Covid 2/23/23: Your Best Possible Situation

Zvi23 Feb 2023 13:10 UTC

92 points

9 comments5 min readLW link

(thezvi.wordpress.com)

Full Transcript: Eliezer Yudkowsky on the Bankless podcast

remember and Andrea_Miotti

23 Feb 2023 12:34 UTC

138 points

90 comments75 min readLW link

Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results

Esben Kran, Fazl, Sabrina Zaki, gabrielrecc and rz2383

23 Feb 2023 10:48 UTC

8 points

0 comments6 min readLW link

[Question] How to estimate a pre-aligned value for a common discussion ground?

EL_File413823 Feb 2023 10:38 UTC

−4 points

12 comments1 min readLW link

Interpersonal alignment intuitions

TekhneMakre23 Feb 2023 9:37 UTC

29 points

18 comments2 min readLW link

Big Mac Subsidy?

jefftk23 Feb 2023 4:00 UTC

160 points

25 comments2 min readLW link

(www.jefftk.com)

[Question] What moral systems (e.g utilitarianism) are common among LessWrong users?

hollowing23 Feb 2023 3:33 UTC

1 point

9 comments1 min readLW link

AGI is likely to be cautious

PonPonPon23 Feb 2023 1:16 UTC

9 points

14 comments3 min readLW link

Short Notes on Research Process

Shoshannah Tekofsky22 Feb 2023 23:41 UTC

21 points

0 comments2 min readLW link

Video/animation: Neel Nanda explains what mechanistic interpretability is

DanielFilan22 Feb 2023 22:42 UTC

24 points

7 comments1 min readLW link

(youtu.be)

A Telepathic Exam about AI and Consequentialism

alkexr22 Feb 2023 21:00 UTC

4 points

4 comments4 min readLW link

[Question] Injecting noise to GPT to get multiple answers

bipolo22 Feb 2023 20:02 UTC

1 point

1 comment1 min readLW link

EIS XI: Moving Forward

scasper22 Feb 2023 19:05 UTC

19 points

2 comments9 min readLW link

Building and Entertaining Couples

Jacob Falkovich22 Feb 2023 19:02 UTC

86 points

11 comments4 min readLW link

Can submarines swim?

jasoncrawford22 Feb 2023 18:48 UTC

18 points

14 comments13 min readLW link

(rootsofprogress.org)

Is there a ML agent that abandons it’s utility function out-of-distribution without losing capabilities?

Christopher King22 Feb 2023 16:49 UTC

1 point

7 comments1 min readLW link

The male AI alignment solution

TekhneMakre22 Feb 2023 16:34 UTC

−25 points

24 comments1 min readLW link

Progress links and tweets, 2023-02-22

jasoncrawford22 Feb 2023 16:23 UTC

13 points

0 comments1 min readLW link

(rootsofprogress.org)

Cyborg Periods: There will be multiple AI transitions

Jan_Kulveit and rosehadshar

22 Feb 2023 16:09 UTC

114 points

9 comments6 min readLW link

The Open Agency Model

Eric Drexler22 Feb 2023 10:35 UTC

114 points

19 comments4 min readLW link

Intervening in the Residual Stream

MadHatter22 Feb 2023 6:29 UTC

30 points

1 comment9 min readLW link

What do language models know about fictional characters?

skybrian22 Feb 2023 5:58 UTC

6 points

0 comments4 min readLW link

Power-Seeking = Minimising free energy

Jonas Hallgren22 Feb 2023 4:28 UTC

23 points

10 comments7 min readLW link

The shallow reality of ‘deep learning theory’

Jesse Hoogland22 Feb 2023 4:16 UTC

35 points

11 comments3 min readLW link

(www.jessehoogland.com)

Candyland is Terrible

jefftk22 Feb 2023 1:50 UTC

16 points

2 comments1 min readLW link

(www.jefftk.com)

A proof of inner Löb’s theorem

James Payor21 Feb 2023 21:11 UTC

13 points

0 comments2 min readLW link

Fighting For Our Lives—What Ordinary People Can Do

TinkerBird21 Feb 2023 20:36 UTC

14 points

18 comments4 min readLW link

The Emotional Type of a Decision

moridinamael21 Feb 2023 20:35 UTC

13 points

0 comments4 min readLW link

What is it like doing AI safety work?

KatWoods21 Feb 2023 20:12 UTC

57 points

2 comments10 min readLW link

Pretraining Language Models with Human Preferences

Tomek Korbak, Sam Bowman and Ethan Perez

21 Feb 2023 17:57 UTC

135 points

20 comments11 min readLW link 2 reviews

A Stranger Priority? Topics at the Outer Reaches of Effective Altruism (my dissertation)

Joe Carlsmith21 Feb 2023 17:26 UTC

38 points

16 comments1 min readLW link

EIS X: Continual Learning, Modularity, Compression, and Biological Brains

scasper21 Feb 2023 16:59 UTC

14 points

4 comments3 min readLW link

No Room for Political Philosophy

Arturo Macias21 Feb 2023 16:11 UTC

−1 points

7 comments3 min readLW link

Deceptive Alignment is <1% Likely by Default

DavidW21 Feb 2023 15:09 UTC

89 points

31 comments14 min readLW link 1 review

AI #1: Sydney and Bing

Zvi21 Feb 2023 14:00 UTC

171 points

45 comments61 min readLW link 1 review

(thezvi.wordpress.com)

You’re not a simulation, ’cause you’re hallucinating

Stuart_Armstrong21 Feb 2023 12:12 UTC

25 points

6 comments1 min readLW link

Basic facts about language models during training

beren21 Feb 2023 11:46 UTC

99 points

15 comments18 min readLW link

[Preprint] Pretraining Language Models with Human Preferences

Giulio21 Feb 2023 11:44 UTC

12 points

0 comments1 min readLW link

(arxiv.org)

Breaking the Optimizer’s Curse, and Consequences for Existential Risks and Value Learning

Roger Dearnaley21 Feb 2023 9:05 UTC

10 points

1 comment23 min readLW link

Medlife Crisis: “Why Do People Keep Falling For Things That Don’t Work?”

RomanHauksson21 Feb 2023 6:22 UTC

12 points

5 comments1 min readLW link

(www.youtube.com)

A foundation model approach to value inference

sen21 Feb 2023 5:09 UTC

6 points

0 comments3 min readLW link