All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30

Reflective Consequentialism

Adam Zerner18 Nov 2022 23:56 UTC

21 points

14 comments4 min readLW link

Value Created vs. Value Extracted

Sable18 Nov 2022 21:34 UTC

8 points

6 comments6 min readLW link

(affablyevil.substack.com)

The Disastrously Confident And Inaccurate AI

Sharat Jacob Jacob18 Nov 2022 19:06 UTC

13 points

0 comments13 min readLW link

How AI Fails Us: A non-technical view of the Alignment Problem

testingthewaters18 Nov 2022 19:02 UTC

7 points

1 comment2 min readLW link

(ethics.harvard.edu)

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahoj18 Nov 2022 19:01 UTC

16 points

10 comments1 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabs18 Nov 2022 16:31 UTC

24 points

4 comments10 min readLW link

Contra Chords

jefftk18 Nov 2022 16:20 UTC

12 points

1 comment7 min readLW link

(www.jefftk.com)

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_Greig18 Nov 2022 12:46 UTC

15 points

2 comments1 min readLW link

Halifax, NS – Monthly Rationalist, EA, and ACX Meetup

Ideopunk18 Nov 2022 11:45 UTC

10 points

0 comments1 min readLW link

Introducing The Logical Foundation, an EA-Aligned Nonprofit with a Plan to End Poverty With Guaranteed Income

Michael Simm18 Nov 2022 8:13 UTC

9 points

23 comments24 min readLW link

My Deontology Says Narrow-Mindedness is Always Wrong

LVSN18 Nov 2022 6:11 UTC

6 points

2 comments1 min readLW link

AI Ethics != Ai Safety

Dentin18 Nov 2022 3:02 UTC

2 points

0 comments1 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

18 Nov 2022 1:48 UTC

72 points

64 comments12 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

18 Nov 2022 1:43 UTC

75 points

7 comments3 min readLW link

(arxiv.org)

AGIs may value intrinsic rewards more than extrinsic ones

catubc17 Nov 2022 21:49 UTC

8 points

6 comments4 min readLW link

LLMs may capture key components of human agency

catubc17 Nov 2022 20:14 UTC

27 points

0 comments4 min readLW link

Mastodon Replies as Comments

jefftk17 Nov 2022 20:10 UTC

20 points

0 comments1 min readLW link

(www.jefftk.com)

Announcing the Progress Forum

jasoncrawford17 Nov 2022 19:26 UTC

83 points

9 comments1 min readLW link

[Question] What kind of bias is this?

Daniel Samuel17 Nov 2022 18:44 UTC

3 points

2 comments1 min readLW link

AI Forecasting Research Ideas

Jsevillamol17 Nov 2022 17:37 UTC

21 points

2 comments1 min readLW link

(docs.google.com)

Results from the interpretability hackathon

Esben Kran and Neel Nanda

17 Nov 2022 14:51 UTC

81 points

0 comments6 min readLW link

(alignmentjam.com)

Covid 11/17/22: Slow Recovery

Zvi17 Nov 2022 14:50 UTC

33 points

3 comments4 min readLW link

(thezvi.wordpress.com)

Sadly, FTX

Zvi17 Nov 2022 14:30 UTC

133 points

18 comments47 min readLW link

(thezvi.wordpress.com)

Deontology and virtue ethics as “effective theories” of consequentialist ethics

Jan_Kulveit17 Nov 2022 14:11 UTC

68 points

9 comments10 min readLW link 1 review

The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica Rumbelow17 Nov 2022 11:06 UTC

27 points

2 comments2 min readLW link

[Question] [Personal Question] Can anyone help me navigate this potentially painful interpersonal dynamic rationally?

SlainLadyMondegreen17 Nov 2022 8:53 UTC

9 points

3 comments4 min readLW link

Massive Scaling Should be Frowned Upon

harsimony17 Nov 2022 8:43 UTC

5 points

6 comments5 min readLW link

[Question] Why are profitable companies laying off staff?

Yair Halberstadt17 Nov 2022 6:19 UTC

15 points

10 comments1 min readLW link

[Question] [retracted] Discussion: Was SBF a naive utilitarian, or a sociopath?

Nicholas Kross17 Nov 2022 2:52 UTC

0 points

4 comments1 min readLW link

Kelsey Piper’s recent interview of SBF

agucova16 Nov 2022 20:30 UTC

51 points

29 comments2 min readLW link

(www.vox.com)

The Echo Principle

Jonathan Moregård16 Nov 2022 20:09 UTC

4 points

0 comments3 min readLW link

(honestliving.substack.com)

[Question] Is there some reason LLMs haven’t seen broader use?

tailcalled16 Nov 2022 20:04 UTC

25 points

27 comments1 min readLW link

When should we be surprised that an invention took “so long”?

jasoncrawford16 Nov 2022 20:04 UTC

32 points

11 comments4 min readLW link

(rootsofprogress.org)

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC

13 points

2 comments12 min readLW link

(sambrown.eu)

If Professional Investors Missed This...

jefftk16 Nov 2022 15:00 UTC

37 points

18 comments3 min readLW link

(www.jefftk.com)

Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn16 Nov 2022 14:40 UTC

75 points

17 comments7 min readLW link 1 review

Current themes in mechanistic interpretability research

Lee Sharkey, Sid Black and beren

16 Nov 2022 14:14 UTC

89 points

2 comments12 min readLW link

Unpacking “Shard Theory” as Hunch, Question, Theory, and Insight

Jacy Reese Anthis16 Nov 2022 13:54 UTC

31 points

9 comments2 min readLW link

Miracles and why not to believe them

mruwnik16 Nov 2022 12:07 UTC

4 points

0 comments2 min readLW link

[Question] How do people do remote research collaborations effectively?

Krieger16 Nov 2022 11:51 UTC

8 points

0 comments1 min readLW link

Method of statements: an alternative to taboo

Q Home16 Nov 2022 10:57 UTC

7 points

0 comments41 min readLW link

The two conceptions of Active Inference: an intelligence architecture and a theory of agency

Roman Leventov16 Nov 2022 9:30 UTC

18 points

0 comments4 min readLW link

Developer experience for the motivation

Adam Zerner16 Nov 2022 7:12 UTC

49 points

7 comments4 min readLW link

Progress links and tweets, 2022-11-15

jasoncrawford16 Nov 2022 3:21 UTC

9 points

0 comments2 min readLW link

(rootsofprogress.org)

EA & LW Forums Weekly Summary (7th Nov − 13th Nov 22′)

Zoe Williams16 Nov 2022 3:04 UTC

19 points

0 comments14 min readLW link

The FTX Saga—Simplified

Annapurna16 Nov 2022 2:42 UTC

44 points

10 comments7 min readLW link

(jorgevelez.substack.com)

Utilitarianism and the idea of a “rational agent” are fundamentally inconsistent with reality

banev16 Nov 2022 0:19 UTC

−4 points

1 comment1 min readLW link

[Question] Is the speed of training large models going to increase significantly in the near future due to Cerebras Andromeda?

Amal 15 Nov 2022 22:50 UTC

13 points

11 comments1 min readLW link

[Question] What is our current best infohazard policy for AGI (safety) research?

Roman Leventov15 Nov 2022 22:33 UTC

12 points

2 comments1 min readLW link

ACX/SSC Meetup 1 pm Sunday Nov 20

svfritz15 Nov 2022 20:39 UTC

2 points

0 comments1 min readLW link