All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 111213 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Seriously, what goes wrong with “reward the agent when it makes you smile”?

TurnTrout11 Aug 2022 22:22 UTC

88 points

43 comments2 min readLW link

Encultured AI Pre-planning, Part 2: Providing a Service

Andrew_Critch and Nick Hay

11 Aug 2022 20:11 UTC

33 points

4 comments3 min readLW link

My summary of the alignment problem

Peter Hroššo11 Aug 2022 19:42 UTC

15 points

3 comments2 min readLW link

(threadreaderapp.com)

Language models seem to be much better than humans at next-token prediction

Buck, Fabien Roger and LawrenceC

11 Aug 2022 17:45 UTC

183 points

60 comments13 min readLW link 1 review

Introducing Pastcasting: A tool for forecasting practice

Sage Future11 Aug 2022 17:38 UTC

95 points

10 comments2 min readLW link 2 reviews

Pendulums, Policy-Level Decisionmaking, Saving State

CFAR!Duncan11 Aug 2022 16:47 UTC

30 points

3 comments8 min readLW link

Covid 8/11/22: The End Is Never The End

Zvi11 Aug 2022 16:20 UTC

28 points

11 comments16 min readLW link

(thezvi.wordpress.com)

Singapore—Small casual dinner in Chinatown #4

Joe Rocca11 Aug 2022 12:30 UTC

3 points

3 comments1 min readLW link

Thoughts on the good regulator theorem

JonasMoss11 Aug 2022 12:08 UTC

12 points

0 comments4 min readLW link

How and why to turn everything into audio

KatWoods and AmberDawn

11 Aug 2022 8:55 UTC

57 points

20 comments5 min readLW link

Shard Theory: An Overview

David Udell11 Aug 2022 5:44 UTC

168 points

34 comments10 min readLW link

[Question] Do advancements in Decision Theory point towards moral absolutism?

Nathan112311 Aug 2022 0:59 UTC

0 points

4 comments4 min readLW link

The alignment problem from a deep learning perspective

Richard_Ngo10 Aug 2022 22:46 UTC

107 points

15 comments27 min readLW link 1 review

How much alignment data will we need in the long run?

Jacob_Hilton10 Aug 2022 21:39 UTC

37 points

15 comments4 min readLW link

On Ego, Reincarnation, Consciousness and The Universe

qmaury10 Aug 2022 20:21 UTC

−3 points

6 comments5 min readLW link

Formalizing Alignment

Marv K10 Aug 2022 18:50 UTC

4 points

0 comments2 min readLW link

How Do We Align an AGI Without Getting Socially Engineered? (Hint: Box It)

Peter S. Park, NickyP and Stephen Fowler

10 Aug 2022 18:14 UTC

28 points

30 comments11 min readLW link

Emergent Abilities of Large Language Models [Linkpost]

aog10 Aug 2022 18:02 UTC

25 points

2 comments1 min readLW link

(arxiv.org)

How To Go From Interpretability To Alignment: Just Retarget The Search

johnswentworth10 Aug 2022 16:08 UTC

214 points

34 comments3 min readLW link 1 review

Using GPT-3 to augment human intelligence

Henrik Karlsson10 Aug 2022 15:54 UTC

52 points

8 comments18 min readLW link

(escapingflatland.substack.com)

ACX meetup [August]

sallatik10 Aug 2022 9:54 UTC

1 point

1 comment1 min readLW link

Dissent Collusion

Screwtape10 Aug 2022 2:43 UTC

32 points

8 comments3 min readLW link

The Medium Is The Bandage

party girl10 Aug 2022 1:45 UTC

11 points

0 comments10 min readLW link

[Question] Why is increasing public awareness of AI safety not a priority?

FinalFormal210 Aug 2022 1:28 UTC

−5 points

14 comments1 min readLW link

Manifold x CSPI $25k Forecasting Tournament

David Chee9 Aug 2022 21:13 UTC

5 points

0 comments1 min readLW link

(www.cspicenter.com)

Proposal: Consider not using distance-direction-dimension words in abstract discussions

moridinamael9 Aug 2022 20:44 UTC

46 points

18 comments5 min readLW link

[Question] How would two superintelligent AIs interact, if they are unaligned with each other?

Nathan11239 Aug 2022 18:58 UTC

4 points

6 comments1 min readLW link

Disagreements about Alignment: Why, and how, we should try to solve them

ojorgensen9 Aug 2022 18:49 UTC

11 points

2 comments16 min readLW link

Progress links and tweets, 2022-08-09

jasoncrawford9 Aug 2022 17:35 UTC

11 points

3 comments1 min readLW link

(rootsofprogress.org)

[Question] Is it possible to find venture capital for AI research org with strong safety focus?

AnonResearch9 Aug 2022 16:12 UTC

6 points

1 comment1 min readLW link

[Question] Many Gods refutation and Instrumental Goals. (Proper one)

aditya malik9 Aug 2022 11:59 UTC

0 points

15 comments1 min readLW link

Content generation. Where do we draw the line?

Q Home9 Aug 2022 10:51 UTC

6 points

7 comments2 min readLW link

[Question] What are some alternatives to Shapley values which drop additivity?

eapi9 Aug 2022 9:16 UTC

11 points

6 comments1 min readLW link

(math.stackexchange.com)

Radio Bostrom: Audio narrations of papers by Nick Bostrom

peter_hartree9 Aug 2022 8:56 UTC

12 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Team Shard Status Report

David Udell9 Aug 2022 5:33 UTC

38 points

8 comments3 min readLW link

Announcing: Mechanism Design for AI Safety—Reading Group

Rubi J. Hudson9 Aug 2022 4:21 UTC

18 points

3 comments4 min readLW link

[Question] What are some Works that might be useful but are difficult, so forgotten?

TekhneMakre9 Aug 2022 2:22 UTC

10 points

5 comments1 min readLW link

Project proposal: Testing the IBP definition of agent

Jeremy Gillen, Thomas Larsen and JamesH

9 Aug 2022 1:09 UTC

21 points

4 comments2 min readLW link

How (not) to choose a research project

Garrett Baker, CatGoddess and Johannes C. Mayer

9 Aug 2022 0:26 UTC

80 points

11 comments7 min readLW link

[Question] Are ya winning, son?

Nathan11239 Aug 2022 0:06 UTC

14 points

13 comments2 min readLW link

General alignment properties

TurnTrout8 Aug 2022 23:40 UTC

51 points

2 comments1 min readLW link

Experiment: Be my math tutor?

sudo8 Aug 2022 22:50 UTC

12 points

5 comments1 min readLW link

Encultured AI, Part 1 Appendix: Relevant Research Examples

Andrew_Critch and Nick Hay

8 Aug 2022 22:44 UTC

11 points

1 comment7 min readLW link

Encultured AI Pre-planning, Part 1: Enabling New Benchmarks

Andrew_Critch and Nick Hay

8 Aug 2022 22:44 UTC

63 points

2 comments6 min readLW link

Broad Basins and Data Compression

Jeremy Gillen, Stephen Fowler and Thomas Larsen

8 Aug 2022 20:33 UTC

33 points

6 comments7 min readLW link

Interpretability/Tool-ness/Alignment/Corrigibility are not Composable

johnswentworth8 Aug 2022 18:05 UTC

150 points

13 comments3 min readLW link

LW Meetup @ DEFCON (Las Vegas) − 5-7pm Thu. Aug. 11 at Forum Food Court (Caesars)

jchan8 Aug 2022 14:57 UTC

6 points

0 comments1 min readLW link

A sufficiently paranoid paperclip maximizer

RomanS8 Aug 2022 11:17 UTC

19 points

10 comments2 min readLW link

[Question] Instrumental Goals and Many Gods Refutation

aditya malik8 Aug 2022 10:46 UTC

−10 points

4 comments1 min readLW link

Area under the curve, Eat Dirt, Broccoli Errors, Copernicus & Chaos

CFAR!Duncan8 Aug 2022 8:17 UTC

43 points

0 comments7 min readLW link