All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul Aug Sep OctNovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 202122 23 24 25 26 27 28 29 30

Scott Aaronson on “Reform AI Alignment”

Shmi20 Nov 2022 22:20 UTC

39 points

17 comments1 min readLW link

(scottaaronson.blog)

On Morality, Ethics, and all that Jazz

Delen Heisman20 Nov 2022 20:00 UTC

4 points

4 comments2 min readLW link

(delen.substack.com)

Limits to the Controllability of AGI

Roman_Yampolskiy, Remmelt Ellen and Karl von Wendt

20 Nov 2022 19:18 UTC

11 points

2 comments9 min readLW link

Career Scouting: Dentistry

koratkar20 Nov 2022 15:55 UTC

70 points

5 comments5 min readLW link

(careerscouting.substack.com)

Decision Theory but also Ghosts

eva_20 Nov 2022 13:24 UTC

26 points

26 comments10 min readLW link

ARC paper: Formalizing the presumption of independence

Erik Jenner20 Nov 2022 1:22 UTC

97 points

2 comments2 min readLW link

(arxiv.org)

Update to Mysteries of mode collapse: text-davinci-002 not RLHF

janus19 Nov 2022 23:51 UTC

71 points

8 comments2 min readLW link

Make the Drought Evaporate!

AnthonyRepetto19 Nov 2022 23:41 UTC

32 points

25 comments3 min readLW link

Elastic Productivity Tools

Simon Berens19 Nov 2022 21:59 UTC

76 points

8 comments2 min readLW link

(simonberens.me)

A Short Dialogue on the Meaning of Reward Functions

Leon Lang, Quintin Pope and peligrietzer

19 Nov 2022 21:04 UTC

45 points

0 comments3 min readLW link

By Default, GPTs Think In Plain Sight

Fabien Roger19 Nov 2022 19:15 UTC

90 points

36 comments9 min readLW link

Review: Bayesian Statistics the Fun Way by Will Kurt

matto19 Nov 2022 18:52 UTC

4 points

2 comments2 min readLW link

[Question] How does acausal trade work in a deterministic multiverse?

sisyphus19 Nov 2022 1:50 UTC

2 points

13 comments1 min readLW link

Choosing the right dish

Adam Zerner19 Nov 2022 1:38 UTC

38 points

7 comments8 min readLW link

Reflective Consequentialism

Adam Zerner18 Nov 2022 23:56 UTC

21 points

14 comments4 min readLW link

Value Created vs. Value Extracted

Sable18 Nov 2022 21:34 UTC

8 points

6 comments6 min readLW link

(affablyevil.substack.com)

The Disastrously Confident And Inaccurate AI

Sharat Jacob Jacob18 Nov 2022 19:06 UTC

13 points

0 comments13 min readLW link

How AI Fails Us: A non-technical view of the Alignment Problem

testingthewaters18 Nov 2022 19:02 UTC

7 points

1 comment2 min readLW link

(ethics.harvard.edu)

[Question] Is there any policy for a fair treatment of AIs whose friendliness is in doubt?

nahoj18 Nov 2022 19:01 UTC

16 points

10 comments1 min readLW link

Distillation of “How Likely Is Deceptive Alignment?”

NickGabs18 Nov 2022 16:31 UTC

24 points

4 comments10 min readLW link

Contra Chords

jefftk18 Nov 2022 16:20 UTC

12 points

1 comment7 min readLW link

(www.jefftk.com)

[Question] Updates on scaling laws for foundation models from ′ Transcending Scaling Laws with 0.1% Extra Compute’

Nick_Greig18 Nov 2022 12:46 UTC

15 points

2 comments1 min readLW link

Halifax, NS – Monthly Rationalist, EA, and ACX Meetup

Ideopunk18 Nov 2022 11:45 UTC

10 points

0 comments1 min readLW link

Introducing The Logical Foundation, an EA-Aligned Nonprofit with a Plan to End Poverty With Guaranteed Income

Michael Simm18 Nov 2022 8:13 UTC

9 points

23 comments24 min readLW link

My Deontology Says Narrow-Mindedness is Always Wrong

LVSN18 Nov 2022 6:11 UTC

6 points

2 comments1 min readLW link

AI Ethics != Ai Safety

Dentin18 Nov 2022 3:02 UTC

2 points

0 comments1 min readLW link

Don’t design agents which exploit adversarial inputs

TurnTrout and Garrett Baker

18 Nov 2022 1:48 UTC

72 points

64 comments12 min readLW link

Engineering Monosemanticity in Toy Models

Adam Jermyn, evhub and Nicholas Schiefer

18 Nov 2022 1:43 UTC

75 points

7 comments3 min readLW link

(arxiv.org)

AGIs may value intrinsic rewards more than extrinsic ones

catubc17 Nov 2022 21:49 UTC

8 points

6 comments4 min readLW link

LLMs may capture key components of human agency

catubc17 Nov 2022 20:14 UTC

27 points

0 comments4 min readLW link

Mastodon Replies as Comments

jefftk17 Nov 2022 20:10 UTC

20 points

0 comments1 min readLW link

(www.jefftk.com)

Announcing the Progress Forum

jasoncrawford17 Nov 2022 19:26 UTC

83 points

9 comments1 min readLW link

[Question] What kind of bias is this?

Daniel Samuel17 Nov 2022 18:44 UTC

3 points

2 comments1 min readLW link

AI Forecasting Research Ideas

Jsevillamol17 Nov 2022 17:37 UTC

21 points

2 comments1 min readLW link

(docs.google.com)

Results from the interpretability hackathon

Esben Kran and Neel Nanda

17 Nov 2022 14:51 UTC

81 points

0 comments6 min readLW link

(alignmentjam.com)

Covid 11/17/22: Slow Recovery

Zvi17 Nov 2022 14:50 UTC

33 points

3 comments4 min readLW link

(thezvi.wordpress.com)

Sadly, FTX

Zvi17 Nov 2022 14:30 UTC

133 points

18 comments47 min readLW link

(thezvi.wordpress.com)

Deontology and virtue ethics as “effective theories” of consequentialist ethics

Jan_Kulveit17 Nov 2022 14:11 UTC

72 points

9 comments10 min readLW link 1 review

The Ground Truth Problem (Or, Why Evaluating Interpretability Methods Is Hard)

Jessica Rumbelow17 Nov 2022 11:06 UTC

27 points

2 comments2 min readLW link

[Question] [Personal Question] Can anyone help me navigate this potentially painful interpersonal dynamic rationally?

SlainLadyMondegreen17 Nov 2022 8:53 UTC

9 points

3 comments4 min readLW link

Massive Scaling Should be Frowned Upon

harsimony17 Nov 2022 8:43 UTC

5 points

6 comments5 min readLW link

[Question] Why are profitable companies laying off staff?

Yair Halberstadt17 Nov 2022 6:19 UTC

15 points

10 comments1 min readLW link

[Question] [retracted] Discussion: Was SBF a naive utilitarian, or a sociopath?

Nicholas Kross17 Nov 2022 2:52 UTC

0 points

4 comments1 min readLW link

Kelsey Piper’s recent interview of SBF

agucova16 Nov 2022 20:30 UTC

51 points

29 comments2 min readLW link

(www.vox.com)

The Echo Principle

Jonathan Moregård16 Nov 2022 20:09 UTC

4 points

0 comments3 min readLW link

(honestliving.substack.com)

[Question] Is there some reason LLMs haven’t seen broader use?

tailcalled16 Nov 2022 20:04 UTC

25 points

27 comments1 min readLW link

When should we be surprised that an invention took “so long”?

jasoncrawford16 Nov 2022 20:04 UTC

32 points

11 comments4 min readLW link

(rootsofprogress.org)

Questions about Value Lock-in, Paternalism, and Empowerment

Sam F. Brown16 Nov 2022 15:33 UTC

13 points

2 comments12 min readLW link

(sambrown.eu)

If Professional Investors Missed This...

jefftk16 Nov 2022 15:00 UTC

37 points

18 comments3 min readLW link

(www.jefftk.com)

Disagreement with bio anchors that lead to shorter timelines

Marius Hobbhahn16 Nov 2022 14:40 UTC

75 points

17 comments7 min readLW link 1 review