28 Oct 2022 23:55 UTC

102 points

9 comments9 min readLW link 2 reviews

(arxiv.org)

Resources that (I think) new alignment researchers should know about

Orpheus1628 Oct 2022 22:13 UTC

70 points

9 comments4 min readLW link

How often does One Person succeed?

Mayank Modi28 Oct 2022 19:32 UTC

1 point

3 comments3 min readLW link

aisafety.community—A living document of AI safety communities

zeshen and plex

28 Oct 2022 17:50 UTC

58 points

23 comments1 min readLW link

Rapid Test Throat Swabbing?

jefftk28 Oct 2022 16:30 UTC

18 points

2 comments1 min readLW link

(www.jefftk.com)

Join the interpretability research hackathon

Esben Kran28 Oct 2022 16:26 UTC

15 points

0 comments5 min readLW link

Syncretism

Annapurna28 Oct 2022 16:08 UTC

16 points

4 comments1 min readLW link

(jorgevelez.substack.com)

Pondering computation in the real world

Adam Shai28 Oct 2022 15:57 UTC

24 points

13 comments5 min readLW link

Ukraine and the Crimea Question

ChristianKl28 Oct 2022 12:26 UTC

−2 points

152 comments11 min readLW link

New book on s-risks

Tobias_Baumann28 Oct 2022 9:36 UTC

73 points

1 comment1 min readLW link

Cryptic symbols

Adam Scherlis28 Oct 2022 6:44 UTC

6 points

17 comments1 min readLW link

(adam.scherlis.com)

All life’s helpers’ beliefs

Tehdastehdas28 Oct 2022 5:47 UTC

−12 points

1 comment5 min readLW link

Prizes for ML Safety Benchmark Ideas

joshc28 Oct 2022 2:51 UTC

36 points

5 comments1 min readLW link

Worldview iPeople—Future Fund’s AI Worldview Prize

Toni MUENDEL28 Oct 2022 1:53 UTC

−21 points

4 comments9 min readLW link

Anatomy of change

Jose Miguel Cruz y Celis28 Oct 2022 1:21 UTC

1 point

0 comments1 min readLW link

Nash equilibria of symmetric zero-sum games

Ege Erdil27 Oct 2022 23:50 UTC

14 points

0 comments14 min readLW link

[Question] Good psychology books/books that contain good psychological models?

shuffled-cantaloupe27 Oct 2022 23:04 UTC

1 point

1 comment1 min readLW link

Podcast: The Left and Effective Altruism with Habiba Islam

garrison27 Oct 2022 17:41 UTC

2 points

2 comments1 min readLW link

Lessons from ‘Famine, Affluence, and Morality’ and its reflection on today.

Mayank Modi27 Oct 2022 17:20 UTC

4 points

0 comments4 min readLW link

[Question] Is the Orthogonality Thesis true for humans?

Noosphere8927 Oct 2022 14:41 UTC

12 points

20 comments1 min readLW link

Historicism in the math-adjacent sciences

mrcbarbier27 Oct 2022 14:38 UTC

3 points

0 comments5 min readLW link

How Risky Is Trick-or-Treating?

jefftk27 Oct 2022 14:10 UTC

77 points

18 comments2 min readLW link

(www.jefftk.com)

Covid 10/27/22: Another Origin Story

Zvi27 Oct 2022 13:40 UTC

32 points

1 comment13 min readLW link

(thezvi.wordpress.com)

[Question] Why are probabilities represented as real numbers instead of rational numbers?

Yaakov T27 Oct 2022 11:23 UTC

5 points

9 comments1 min readLW link

Five Areas I Wish EAs Gave More Focus

Prometheus27 Oct 2022 6:13 UTC

13 points

18 comments4 min readLW link

Apply to the Redwood Research Mechanistic Interpretability Experiment (REMIX), a research program in Berkeley

maxnadeau, Xander Davies, Buck and Nate Thomas

27 Oct 2022 1:32 UTC

135 points

14 comments12 min readLW link

[Question] Quantum Suicide and Aumann’s Agreement Theorem

Isaac King27 Oct 2022 1:32 UTC

16 points

20 comments1 min readLW link

Reslab Request for Information: EA hardware projects

Joel Becker26 Oct 2022 21:13 UTC

10 points

0 comments1 min readLW link

A list of Petrov buttons

philh26 Oct 2022 20:50 UTC

24 points

10 comments6 min readLW link

(reasonableapproximation.net)

The Game of Antonyms

Faustify26 Oct 2022 19:26 UTC

5 points

6 comments8 min readLW link

Paper: In-context Reinforcement Learning with Algorithm Distillation [Deepmind]

LawrenceC26 Oct 2022 18:45 UTC

30 points

5 comments1 min readLW link

(arxiv.org)

[Question] How to become more articulate?

just_browsing26 Oct 2022 14:43 UTC

19 points

14 comments1 min readLW link

Open Bands: Leading Rhythm

jefftk26 Oct 2022 14:30 UTC

10 points

0 comments4 min readLW link

(www.jefftk.com)

Signals of war in August 2021

yieldthought26 Oct 2022 8:11 UTC

70 points

16 comments2 min readLW link

Trigger-based rapid checklists

VipulNaik26 Oct 2022 4:05 UTC

45 points

0 comments9 min readLW link

Why some people believe in AGI, but I don’t.

cveres26 Oct 2022 3:09 UTC

−15 points

6 comments4 min readLW link

Intent alignment should not be the goal for AGI x-risk reduction

John Nay26 Oct 2022 1:24 UTC

1 point

10 comments3 min readLW link

Reinforcement Learning Goal Misgeneralization: Can we guess what kind of goals are selected by default?

StefanHex and Julian_R

25 Oct 2022 20:48 UTC

15 points

2 comments4 min readLW link

A Walkthrough of A Mathematical Framework for Transformer Circuits

Neel Nanda25 Oct 2022 20:24 UTC

52 points

7 comments1 min readLW link

(www.youtube.com)

Nothing.

rogersbacon25 Oct 2022 16:33 UTC

−10 points

4 comments6 min readLW link

(www.secretorum.life)

Maps and Blueprint; the Two Sides of the Alignment Equation

Nora_Ammann25 Oct 2022 16:29 UTC

29 points

1 comment5 min readLW link

Consider Applying to the Future Fellowship at MIT

jefftk25 Oct 2022 15:40 UTC

29 points

0 comments1 min readLW link

(www.jefftk.com)

Beyond Kolmogorov and Shannon

Alexander Gietelink Oldenziel and Adam Shai

25 Oct 2022 15:13 UTC

63 points

22 comments5 min readLW link

What does it take to defend the world against out-of-control AGIs?

Steven Byrnes25 Oct 2022 14:47 UTC

218 points

52 comments30 min readLW link 1 review

Refine: what helped me write more?

Alexander Gietelink Oldenziel25 Oct 2022 14:44 UTC

12 points

0 comments2 min readLW link

Logical Decision Theories: Our final failsafe?

Noosphere8925 Oct 2022 12:51 UTC

−7 points

8 comments1 min readLW link

(www.lesswrong.com)

What will the scaled up GATO look like? (Updated with questions)

Amal 25 Oct 2022 12:44 UTC

34 points

22 comments1 min readLW link

Mechanism Design for AI Safety—Reading Group Curriculum

Rubi J. Hudson25 Oct 2022 3:54 UTC

15 points

3 comments4 min readLW link

Furry Rationalists & Effective Anthropomorphism both exist

agentydragon25 Oct 2022 3:37 UTC

42 points

3 comments1 min readLW link

EA & LW Forums Weekly Summary (17 − 23 Oct 22′)

Zoe Williams25 Oct 2022 2:57 UTC

10 points

0 comments13 min readLW link