All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30 31

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

25 Oct 2022 20:48 UTC

15 points

2 comments4 min readLW link

A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel Nanda25 Oct 2022 20:24 UTC

52 points

7 comments1 min readLW link

(www.youtube.com)

Nothing.

rogersbacon25 Oct 2022 16:33 UTC

−10 points

4 comments6 min readLW link

(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Alignment Equation

Nora_Ammann25 Oct 2022 16:29 UTC

24 points

1 comment5 min readLW link

Consider Applying to the Future Fellowship at MIT

jefftk25 Oct 2022 15:40 UTC

29 points

0 comments1 min readLW link

(www.jefftk.com)

Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel and Adam Shai

25 Oct 2022 15:13 UTC

62 points

22 comments5 min readLW link

What does it take to defend the world against out-of-control AGIs?

Steven Byrnes25 Oct 2022 14:47 UTC

212 points

52 comments30 min readLW link 1 review

Refine: what helped me write more?

Alexander Gietelink Oldenziel25 Oct 2022 14:44 UTC

12 points

0 comments2 min readLW link

Logical Decision Theories: Our final failsafe?

Noosphere8925 Oct 2022 12:51 UTC

−7 points

8 comments1 min readLW link

(www.lesswrong.com)

What will the scaled up GATO look like? (Updated with questions)

Amal 25 Oct 2022 12:44 UTC

34 points

22 comments1 min readLW link

Mechanism Design for AI Safety—Reading Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC

15 points

3 comments4 min readLW link

Furry Rationalists & Effective Anthropomorphism both exist

agentydragon25 Oct 2022 3:37 UTC

42 points

3 comments1 min readLW link

EA & LW Forums Weekly Summary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC

10 points

0 comments13 min readLW link

Dance Weekends: Tests not Masks

jefftk25 Oct 2022 2:10 UTC

12 points

0 comments2 min readLW link

(www.jefftk.com)

[Question] What is good Cyber Security Advice?

Gunnar_Zarncke24 Oct 2022 23:27 UTC

30 points

12 comments2 min readLW link

Connections between Mind-Body Problem & Civilizations

oblivion24 Oct 2022 21:55 UTC

−3 points

1 comment1 min readLW link

[Question] Rationalism and money

David K24 Oct 2022 21:22 UTC

−5 points

2 comments1 min readLW link

[Question] Game semantics

David K24 Oct 2022 21:22 UTC

2 points

2 comments1 min readLW link

A Good Future (rough draft)

Michael Soareverix24 Oct 2022 20:45 UTC

10 points

5 comments3 min readLW link

A Barebones Guide to Mechanistic Interpretability Prerequisites

Neel Nanda24 Oct 2022 20:45 UTC

64 points

12 comments3 min readLW link

(neelnanda.io)

POWERplay: An open-source toolchain to study AI power-seeking

Edouard Harris24 Oct 2022 20:03 UTC

29 points

0 comments1 min readLW link

(github.com)

Consider trying Vivek Hebbar’s alignment exercises

Orpheus1624 Oct 2022 19:46 UTC

38 points

1 comment4 min readLW link

[Question] Education not meant for mass-consumption

Tolo24 Oct 2022 19:45 UTC

7 points

5 comments2 min readLW link

Realizations in Regards to Masculinity

nmc24 Oct 2022 19:42 UTC

−2 points

2 comments2 min readLW link

The Futility of Religion

nmc24 Oct 2022 19:42 UTC

−1 points

5 comments3 min readLW link

The optimal timing of spending on AGI safety work; why we should probably be spending more now

Tristan Cook24 Oct 2022 17:42 UTC

62 points

0 comments17 min readLW link

AGI in our lifetimes is wishful thinking

niknoble24 Oct 2022 11:53 UTC

1 point

25 comments8 min readLW link

DeepMind on Stratego, an imperfect information game

sanxiyn24 Oct 2022 5:57 UTC

15 points

9 comments1 min readLW link

(arxiv.org)

[Question] TOMT: Post from 1-2 years ago talking about a paper on social networks

Simon Berens24 Oct 2022 1:29 UTC

5 points

1 comment1 min readLW link

AI researchers announce NeuroAI agenda

Cameron Berg24 Oct 2022 0:14 UTC

37 points

12 comments6 min readLW link

(arxiv.org)

Empowerment is (almost) All We Need

jacob_cannell23 Oct 2022 21:48 UTC

61 points

44 comments17 min readLW link

“Originality is nothing but judicious imitation”—Voltaire

Vestozia23 Oct 2022 19:00 UTC

0 points

0 comments13 min readLW link

Mid-Peninsula ACX/LW Meetup [CANCELLED]

moshezadka23 Oct 2022 17:37 UTC

1 point

0 comments1 min readLW link

I am a Memoryless System

Nicholas Kross23 Oct 2022 17:34 UTC

26 points

2 comments9 min readLW link

(www.thinkingmuchbetter.com)

Accountability Buddies: Why you might want one (+ Database to find one!)

Samuel Nellessen23 Oct 2022 16:25 UTC

10 points

3 comments7 min readLW link

How to get past Haidt’s elephant and listen

Astynax23 Oct 2022 16:06 UTC

13 points

4 comments2 min readLW link

Writing Russian and Ukrainian words in Latin script

Viliam23 Oct 2022 15:25 UTC

19 points

22 comments6 min readLW link

[Question] Have you noticed any ways that rationalists differ? [Brainstorming session]

tailcalled23 Oct 2022 11:32 UTC

23 points

22 comments1 min readLW link

Mnestics

Jarred Filmer23 Oct 2022 0:30 UTC

124 points

6 comments4 min readLW link

Telic intuitions across the sciences

mrcbarbier22 Oct 2022 21:31 UTC

4 points

0 comments17 min readLW link

A basic lexicon of telic concepts

mrcbarbier22 Oct 2022 21:28 UTC

2 points

0 comments3 min readLW link

Do we have the right kind of math for roles, goals and meaning?

mrcbarbier22 Oct 2022 21:28 UTC

13 points

5 comments7 min readLW link

[Question] The Last Year - is there an existing novel about the last year before AI doom?

Luca Petrolati22 Oct 2022 20:44 UTC

4 points

4 comments1 min readLW link

The highest-probability outcome can be out of distribution

tailcalled22 Oct 2022 20:00 UTC

14 points

5 comments1 min readLW link

Newsletter for Alignment Research: The ML Safety Updates

Esben Kran22 Oct 2022 16:17 UTC

26 points

0 comments7 min readLW link

Crypto loves impact markets: Notes from Schelling Point Bogotá

Rachel Shu22 Oct 2022 15:58 UTC

17 points

2 comments7 min readLW link

[Question] When trying to define general intelligence is ability to achieve goals the best metric?

jmh22 Oct 2022 3:09 UTC

5 points

0 comments1 min readLW link

[Question] Simple question about corrigibility and values in AI.

jmh22 Oct 2022 2:59 UTC

6 points

1 comment1 min readLW link

Moorean Statements

David Udell22 Oct 2022 0:50 UTC

11 points

11 comments1 min readLW link

Wisdom Cannot Be Unzipped

Sable22 Oct 2022 0:28 UTC

75 points

17 comments7 min readLW link 1 review

(affablyevil.substack.com)