All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 202120222023 2024 2025

All Jan Feb Mar Apr May Jun Jul Aug Sep Oct NovDec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

[Question] Is Paul Christiano still as optimistic about Approval-Directed Agents as he was in 2018?

Chris_Leong14 Dec 2022 23:28 UTC

8 points

0 comments1 min readLW link

«Boundaries», Part 3b: Alignment problems in terms of boundaries

Andrew_Critch14 Dec 2022 22:34 UTC

72 points

7 comments13 min readLW link

Aligning alignment with performance

Marv K14 Dec 2022 22:19 UTC

2 points

0 comments2 min readLW link

Contrary to List of Lethality’s point 22, alignment’s door number 2

False Name14 Dec 2022 22:01 UTC

−2 points

5 comments22 min readLW link

Kolmogorov Complexity and Simulation Hypothesis

False Name14 Dec 2022 22:01 UTC

−3 points

0 comments7 min readLW link

[Question] Stanley Meyer’s water fuel cell

mikbp14 Dec 2022 21:19 UTC

2 points

6 comments1 min readLW link

[Question] Is the AI timeline too short to have children?

Yoreth14 Dec 2022 18:32 UTC

38 points

20 comments1 min readLW link

Predicting GPU performance

Marius Hobbhahn and Tamay

14 Dec 2022 16:27 UTC

60 points

26 comments1 min readLW link

(epochai.org)

[Incomplete] What is Computation Anyway?

DragonGod14 Dec 2022 16:17 UTC

16 points

1 comment13 min readLW link

(arxiv.org)

Chair Hanging Peg

jefftk14 Dec 2022 15:30 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

My AGI safety research—2022 review, ’23 plans

Steven Byrnes14 Dec 2022 15:15 UTC

51 points

10 comments7 min readLW link

Extracting and Evaluating Causal Direction in LLMs’ Activations

Fabien Roger and simeon_c

14 Dec 2022 14:33 UTC

29 points

5 comments11 min readLW link

Key Mostly Outward-Facing Facts From the Story of VaccinateCA

Zvi14 Dec 2022 13:30 UTC

61 points

2 comments23 min readLW link

(thezvi.wordpress.com)

Discovering Latent Knowledge in Language Models Without Supervision

Xodarap14 Dec 2022 12:32 UTC

45 points

1 comment1 min readLW link

(arxiv.org)

[Question] COVID China Personal Advice (No mRNA vax, possible hospital overload, bug-chasing edition)

Lao Mein14 Dec 2022 10:31 UTC

20 points

11 comments1 min readLW link

Beyond a better world

Davidmanheim14 Dec 2022 10:18 UTC

14 points

7 comments4 min readLW link

(progressforum.org)

Proof as mere strong evidence

adamShimi14 Dec 2022 8:56 UTC

28 points

16 comments2 min readLW link

(epistemologicalvigilance.substack.com)

Trying to disambiguate different questions about whether RLHF is “good”

Buck14 Dec 2022 4:03 UTC

108 points

47 comments7 min readLW link 1 review

[Question] How can one literally buy time (from x-risk) with money?

Alex_Altair13 Dec 2022 19:24 UTC

24 points

3 comments1 min readLW link

[Question] Best introductory overviews of AGI safety?

JakubK13 Dec 2022 19:01 UTC

21 points

9 comments2 min readLW link

(forum.effectivealtruism.org)

Applications open for AGI Safety Fundamentals: Alignment Course

Richard_Ngo13 Dec 2022 18:31 UTC

49 points

0 comments2 min readLW link

What Does It Mean to Align AI With Human Values?

Algon13 Dec 2022 16:56 UTC

8 points

3 comments1 min readLW link

(www.quantamagazine.org)

It Takes Two Paracetamol?

Eli_13 Dec 2022 16:29 UTC

33 points

10 comments2 min readLW link

[Interim research report] Taking features out of superposition with sparse autoencoders

Lee Sharkey, Dan Braun and beren

13 Dec 2022 15:41 UTC

154 points

23 comments22 min readLW link 2 reviews

[Question] Is the ChatGPT-simulated Linux virtual machine real?

Kenoubi13 Dec 2022 15:41 UTC

18 points

7 comments1 min readLW link

Existential AI Safety is NOT separate from near-term applications

scasper13 Dec 2022 14:47 UTC

37 points

17 comments3 min readLW link

What is the correlation between upvoting and benefit to readers of LW?

banev13 Dec 2022 14:26 UTC

7 points

15 comments1 min readLW link

Limits of Superintelligence

Aleksei Petrenko13 Dec 2022 12:19 UTC

1 point

5 comments1 min readLW link

Bay 2022 Solstice

Raemon13 Dec 2022 8:58 UTC

17 points

0 comments1 min readLW link

Last day to nominate things for the Review. Also, 2019 books still exist.

Raemon13 Dec 2022 8:53 UTC

15 points

0 comments1 min readLW link

AI alignment is distinct from its near-term applications

paulfchristiano13 Dec 2022 7:10 UTC

255 points

21 comments2 min readLW link

(ai-alignment.com)

Take 10: Fine-tuning with RLHF is aesthetically unsatisfying.

Charlie Steiner13 Dec 2022 7:04 UTC

37 points

3 comments2 min readLW link

[Question] Are lawsuits against AGI companies extending AGI timelines?

SlowingAGI13 Dec 2022 6:00 UTC

1 point

1 comment1 min readLW link

EA & LW Forums Weekly Summary (5th Dec − 11th Dec 22′)

Zoe Williams13 Dec 2022 2:53 UTC

7 points

0 comments18 min readLW link

Alignment with argument-networks and assessment-predictions

Tor Økland Barstad13 Dec 2022 2:17 UTC

10 points

5 comments45 min readLW link

Revisiting algorithmic progress

Tamay and Ege Erdil

13 Dec 2022 1:39 UTC

95 points

15 comments2 min readLW link 1 review

(arxiv.org)

An exploration of GPT-2′s embedding weights

Adam Scherlis13 Dec 2022 0:46 UTC

44 points

4 comments10 min readLW link

12 career-related questions that may (or may not) be helpful for people interested in alignment research

Orpheus1612 Dec 2022 22:36 UTC

20 points

0 comments2 min readLW link

Concept extrapolation for hypothesis generation

Stuart_Armstrong, Patrick Leask and rgorman

12 Dec 2022 22:09 UTC

20 points

2 comments3 min readLW link

Let’s go meta: Grammatical knowledge and self-referential sentences [ChatGPT]

Bill Benzon12 Dec 2022 21:50 UTC

5 points

0 comments9 min readLW link

D&D.Sci December 2022 Evaluation and Ruleset

abstractapplic12 Dec 2022 21:21 UTC

17 points

8 comments2 min readLW link

Log-odds are better than Probabilities

Robert_AIZI12 Dec 2022 20:10 UTC

22 points

4 comments4 min readLW link

(aizi.substack.com)

Bengaluru LW/ACX Social Meetup—December 2022

faiz12 Dec 2022 19:30 UTC

4 points

0 comments1 min readLW link

Psychological Disorders and Problems

adamShimi and Gabriel Alfour

12 Dec 2022 18:15 UTC

39 points

6 comments1 min readLW link

Confusing the goal and the path

adamShimi12 Dec 2022 16:42 UTC

44 points

7 comments1 min readLW link

(epistemologicalvigilance.substack.com)

Meaningful things are those the universe possesses a semantics for

Abhimanyu Pallavi Sudhir12 Dec 2022 16:03 UTC

16 points

14 comments14 min readLW link

Tradeoffs in complexity, abstraction, and generality

remember and Gabriel Alfour

12 Dec 2022 15:55 UTC

32 points

0 comments2 min readLW link

Green Line Extension Opening Dates

jefftk12 Dec 2022 14:40 UTC

12 points

0 comments1 min readLW link

(www.jefftk.com)

Join the AI Testing Hackathon this Friday

Esben Kran12 Dec 2022 14:24 UTC

10 points

0 comments8 min readLW link

(alignmentjam.com)

Side-channels: input versus output

davidad12 Dec 2022 12:32 UTC

44 points

16 comments2 min readLW link