All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025 2026

All Jan Feb Mar Apr May Jun Jul AugSepOct Nov Dec

All 1 234 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Sticky goals: a concrete experiment for understanding deceptive alignment

evhub2 Sep 2022 21:57 UTC

39 points

13 comments3 min readLW link

Agency engineering: is AI-alignment “to human intent” enough?

catubc2 Sep 2022 18:14 UTC

9 points

10 comments6 min readLW link

Hanover, Germany—ACX Meetups Everywhere 2022

eikowagenknecht2 Sep 2022 17:31 UTC

2 points

0 comments1 min readLW link

Laziness in AI

Richard Henage2 Sep 2022 17:04 UTC

13 points

5 comments1 min readLW link

Exporting Hangouts History

jefftk2 Sep 2022 15:00 UTC

27 points

0 comments2 min readLW link

(www.jefftk.com)

Simulators

janus2 Sep 2022 12:45 UTC

713 points

170 comments41 min readLW link 8 reviews

(generative.ink)

Levelling Up in AI Safety Research Engineering

GMM2 Sep 2022 4:59 UTC

59 points

9 comments15 min readLW link

Stop Discouraging Microwave Formula Preparation

jefftk2 Sep 2022 2:10 UTC

69 points

12 comments2 min readLW link

(www.jefftk.com)

A Richly Interactive AGI Alignment Chart

lisperati2 Sep 2022 0:44 UTC

14 points

6 comments1 min readLW link

Appendix: How to run a successful Hamming circle

CFAR!Duncan2 Sep 2022 0:22 UTC

47 points

6 comments7 min readLW link

Replacement for PONR concept

Daniel Kokotajlo2 Sep 2022 0:09 UTC

59 points

6 comments2 min readLW link

AI coordination needs clear wins

evhub1 Sep 2022 23:41 UTC

148 points

16 comments2 min readLW link 1 review

Short story speculating on possible ramifications of AI on the art world

Yitz1 Sep 2022 21:15 UTC

30 points

8 comments3 min readLW link

(archiveofourown.org)

Why was progress so slow in the past?

jasoncrawford1 Sep 2022 20:26 UTC

54 points

31 comments6 min readLW link

(rootsofprogress.org)

AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022

Sam Bowman1 Sep 2022 19:15 UTC

76 points

2 comments7 min readLW link

Gradient Hacker Design Principles From Biology

johnswentworth1 Sep 2022 19:03 UTC

60 points

13 comments3 min readLW link

Book review: Put Your Ass Where Your Heart Wants to Be

Ruhul1 Sep 2022 18:21 UTC

1 point

2 comments10 min readLW link

A Survey of Foundational Methods in Inverse Reinforcement Learning

adamk1 Sep 2022 18:21 UTC

27 points

0 comments12 min readLW link

I Tripped and Became GPT! (And How This Updated My Timelines)

Frankophone1 Sep 2022 17:56 UTC

31 points

0 comments4 min readLW link

[Question] Fixed point theory (locally (α,β,ψ) dominated contractive condition)

muzammil1 Sep 2022 17:56 UTC

0 points

3 comments1 min readLW link

Alignment is hard. Communicating that, might be harder

Eleni Angelou1 Sep 2022 16:57 UTC

7 points

8 comments3 min readLW link

Covid 9/1/22: Meet the New Booster

Zvi1 Sep 2022 14:00 UTC

41 points

6 comments14 min readLW link

(thezvi.wordpress.com)

A Starter-kit for Rationality Space

Jesse Hoogland1 Sep 2022 13:04 UTC

43 points

0 comments1 min readLW link

(github.com)

Pondering the paucity of volcanic profanity post Pompeii perusal

CraigMichael1 Sep 2022 9:29 UTC

21 points

2 comments15 min readLW link

Infra-Exercises, Part 1

Diffractor, Jack Parker and Connall Garrod

1 Sep 2022 5:06 UTC

64 points

11 comments1 min readLW link

Strategy For Conditioning Generative Models

james.lucassen and evhub

1 Sep 2022 4:34 UTC

31 points

4 comments18 min readLW link

Safety Committee Resources

jefftk1 Sep 2022 2:30 UTC

22 points

2 comments1 min readLW link

(www.jefftk.com)

Progress links and tweets, 2022-08-31

jasoncrawford31 Aug 2022 21:54 UTC

13 points

4 comments1 min readLW link

(rootsofprogress.org)

Enantiodromia

ChristianKl31 Aug 2022 21:13 UTC

39 points

7 comments3 min readLW link

[Question] Supposing Europe is headed for a serious energy crisis this winter, what can/should one do as an individual to prepare?

Erich_Grunewald31 Aug 2022 19:28 UTC

18 points

13 comments1 min readLW link

New 80,000 Hours problem profile on existential risks from AI

Benjamin Hilton31 Aug 2022 17:36 UTC

28 points

6 comments7 min readLW link

(80000hours.org)

Grand Theft Education

Zvi31 Aug 2022 11:50 UTC

66 points

18 comments20 min readLW link

(thezvi.wordpress.com)

How much impact can any one man have?

GregorDeVillain31 Aug 2022 10:26 UTC

9 points

3 comments4 min readLW link

[Question] How might we make better use of AI capabilities research for alignment purposes?

Jemal Young31 Aug 2022 4:19 UTC

11 points

4 comments1 min readLW link

[Question] AI Box Experiment: Are people still interested?

Double31 Aug 2022 3:04 UTC

30 points

13 comments1 min readLW link

OC ACX/LW in Newport Beach

Michael Michalchik31 Aug 2022 2:56 UTC

1 point

1 comment1 min readLW link

Survey of NLP Researchers: NLP is contributing to AGI progress; major catastrophe plausible

Sam Bowman31 Aug 2022 1:39 UTC

91 points

6 comments2 min readLW link

And the word was “God”

pchvykov30 Aug 2022 21:13 UTC

−22 points

4 comments3 min readLW link

Worlds Where Iterative Design Fails

johnswentworth30 Aug 2022 20:48 UTC

237 points

32 comments10 min readLW link 1 review

Inner Alignment via Superpowers

JamesH, Thomas Larsen and Jeremy Gillen

30 Aug 2022 20:01 UTC

37 points

13 comments4 min readLW link

ML Model Attribution Challenge [Linkpost]

aog30 Aug 2022 19:34 UTC

11 points

0 comments1 min readLW link

(mlmac.io)

How likely is deceptive alignment?

evhub30 Aug 2022 19:34 UTC

108 points

31 comments60 min readLW link

Built-In Bundling For Faster Loading

jefftk30 Aug 2022 19:20 UTC

15 points

0 comments2 min readLW link

(www.jefftk.com)

[Question] A bayesian updating on expert opinions

amarai30 Aug 2022 11:56 UTC

1 point

1 comment1 min readLW link

Any Utilitarianism Makes Sense As Policy

George3d630 Aug 2022 9:55 UTC

6 points

6 comments7 min readLW link

(www.epistem.ink)

A gentle primer on caring, including in strange senses, with applications

Kaarel30 Aug 2022 8:05 UTC

10 points

4 comments18 min readLW link

[Question] What is the best critique of AI existential risk arguments?

joshc30 Aug 2022 2:18 UTC

6 points

11 comments1 min readLW link

How to plan for a radically uncertain future?

Kerry30 Aug 2022 2:14 UTC

57 points

35 comments1 min readLW link

EA & LW Forums Weekly Summary (21 Aug − 27 Aug 22′)

Zoe Williams30 Aug 2022 1:42 UTC

57 points

4 comments12 min readLW link

Can We Align a Self-Improving AGI?

Peter S. Park30 Aug 2022 0:14 UTC

8 points

5 comments11 min readLW link