All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025 2026

All Jan Feb Mar Apr May Jun JulAugSep Oct Nov Dec

All 123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

My current LK99 questions

Eliezer Yudkowsky1 Aug 2023 22:48 UTC

211 points

38 comments5 min readLW link

Spiral Staircase

Michael Samoilov1 Aug 2023 21:51 UTC

21 points

2 comments2 min readLW link

Open Mic—August 2023

Adam Zerner1 Aug 2023 19:24 UTC

8 points

0 comments1 min readLW link

ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks

Beth Barnes1 Aug 2023 18:30 UTC

153 points

12 comments5 min readLW link

(evals.alignment.org)

[Question] When(if ever) are superstimuli good/useful/advantageous?

Perhaps1 Aug 2023 15:50 UTC

−7 points

2 comments1 min readLW link

AISN #17: Automatically Circumventing LLM Guardrails, the Frontier Model Forum, and Senate Hearing on AI Oversight

Dan H1 Aug 2023 15:40 UTC

8 points

0 comments8 min readLW link

(newsletter.safe.ai)

AISN #16: White House Secures Voluntary Commitments from Leading AI Labs and Lessons from Oppenheimer

Dan H and Corin Katzke

1 Aug 2023 15:39 UTC

3 points

0 comments6 min readLW link

(newsletter.safe.ai)

“Desperate Honesty” by Agnes Callard

David Gross1 Aug 2023 13:34 UTC

11 points

0 comments2 min readLW link

(dailynous.com)

Barbieheimer: Across the Dead Reckoning

Zvi1 Aug 2023 13:00 UTC

49 points

17 comments41 min readLW link

(thezvi.wordpress.com)

Untangling Infrabayesianism: A redistillation [PDF link; ~12k words + lots of math]

Lorxus1 Aug 2023 12:42 UTC

29 points

22 comments2 min readLW link

(docdro.id)

What Is Childhood Supposed To Be?

Sable1 Aug 2023 9:51 UTC

21 points

13 comments3 min readLW link

(affablyevil.substack.com)

AI romantic partners will harm society if they go unregulated

Roman Leventov1 Aug 2023 9:32 UTC

27 points

76 comments13 min readLW link

What is autonomy, and how does it lead to greater risk from AI?

Davidmanheim1 Aug 2023 7:58 UTC

30 points

0 comments6 min readLW link

Evaluating Superhuman Models with Consistency Checks

Daniel Paleka and Lukas Fluri

1 Aug 2023 7:51 UTC

21 points

2 comments9 min readLW link

(arxiv.org)

[See link to Sept meetup below!] San Francisco ACX Meetup “First Saturday” August 5, 1 pm

guenael1 Aug 2023 3:38 UTC

1 point

0 comments1 min readLW link

[Question] Exercise: Solve “Thinking Physics”

Raemon1 Aug 2023 0:44 UTC

102 points

30 comments5 min readLW link 1 review

The “public debate” about AI is confusing for the general public and for policymakers because it is a three-sided debate

Adam David Long1 Aug 2023 0:08 UTC

146 points

30 comments4 min readLW link

The “no sandbagging on checkable tasks” hypothesis

Joe Carlsmith31 Jul 2023 23:06 UTC

61 points

14 comments9 min readLW link

A Social History of Truth

Vaniver31 Jul 2023 22:49 UTC

70 points

2 comments14 min readLW link

Watermarking considered overrated?

DanielFilan31 Jul 2023 21:36 UTC

19 points

4 comments1 min readLW link

What The Lord of the Rings Teaches Us About AI Alignment

Jeffrey Heninger31 Jul 2023 20:16 UTC

25 points

12 comments7 min readLW link

The “spelling miracle”: GPT-3 spelling abilities and glitch tokens revisited

mwatkins31 Jul 2023 19:47 UTC

85 points

29 comments20 min readLW link

“Building a House” Review

jefftk31 Jul 2023 19:20 UTC

64 points

6 comments1 min readLW link

(www.jefftk.com)

The Meaning of Shoggoth AI Memes

Dan Smith31 Jul 2023 18:52 UTC

−1 points

5 comments2 min readLW link

[Question] Is there any existing term summarizing non-scalable oversight methods in outer alignment?

Allen Shen31 Jul 2023 17:31 UTC

1 point

0 comments1 min readLW link

Lack of Social Grace Is an Epistemic Virtue

Zack_M_Davis31 Jul 2023 16:38 UTC

47 points

114 comments4 min readLW link 2 reviews

Thoughts on sharing information about language model capabilities

paulfchristiano31 Jul 2023 16:04 UTC

211 points

44 comments11 min readLW link 1 review

Trading off compute in training and inference (Overview)

Pablo Villalobos31 Jul 2023 16:03 UTC

42 points

2 comments7 min readLW link

(epochai.org)

Open Problems and Fundamental Limitations of RLHF

scasper31 Jul 2023 15:31 UTC

66 points

6 comments2 min readLW link

(arxiv.org)

“Not Necessarily”

Benjamin Hendricks31 Jul 2023 15:19 UTC

24 points

2 comments2 min readLW link

How to find AI alignment researchers to collaborate with?

Florian Dietz31 Jul 2023 9:05 UTC

2 points

2 comments1 min readLW link

[Question] Is Kennedy a Nazi?

Pee Doom31 Jul 2023 8:51 UTC

−3 points

11 comments2 min readLW link

Is Light Drinking Protective?

jefftk31 Jul 2023 3:00 UTC

45 points

8 comments2 min readLW link

(www.jefftk.com)

EU’s AI ambitions at risk as US pushes to water down international treaty (linkpost)

mic31 Jul 2023 0:34 UTC

10 points

0 comments4 min readLW link

(www.euractiv.com)

The rise of AI in cybercrime

BobyResearcher30 Jul 2023 20:19 UTC

−15 points

1 comment2 min readLW link

(riseofAIincybercryme)

SSA vs. SIA: how future population may provide evidence for or against the foundations of political liberalism

j30 Jul 2023 20:18 UTC

−6 points

10 comments55 min readLW link

Rationalization Maximizes Expected Value

Kevin Dorst30 Jul 2023 20:11 UTC

19 points

10 comments7 min readLW link

(kevindorst.substack.com)

Apollo Neuro Results

Elizabeth30 Jul 2023 18:40 UTC

85 points

17 comments3 min readLW link

(acesounderglass.com)

Hilbert’s Triumph, Church and Turing’s failure, and what it means (Post #2)

Noosphere8930 Jul 2023 14:33 UTC

−5 points

16 comments15 min readLW link

[Question] Specific Arguments against open source LLMs?

Iknownothing30 Jul 2023 14:27 UTC

4 points

2 comments1 min readLW link

Socialism in large organizations

Adam Zerner30 Jul 2023 7:25 UTC

8 points

16 comments2 min readLW link

How to make real-money prediction markets on arbitrary topics (Outdated)

yutaka30 Jul 2023 2:11 UTC

57 points

13 comments3 min readLW link

[Question] Does decidability of a theory imply completeness of the theory?

Noosphere8929 Jul 2023 23:53 UTC

6 points

12 comments1 min readLW link

[Question] If I showed the EQ-SQ theory’s findings to be due to measurement bias, would anyone change their minds about it?

tailcalled29 Jul 2023 19:38 UTC

23 points

13 comments1 min readLW link

Self-driving car bets

paulfchristiano29 Jul 2023 18:10 UTC

237 points

46 comments5 min readLW link

(sideways-view.com)

The Parable of the Dagger—The Animation

Writer29 Jul 2023 14:03 UTC

20 points

6 comments1 min readLW link

(youtu.be)

Are Guitars Obsolete?

jefftk29 Jul 2023 13:20 UTC

11 points

8 comments2 min readLW link

(www.jefftk.com)

NAMSI: A promising approach to alignment

Georgeo5729 Jul 2023 7:03 UTC

−6 points

6 comments1 min readLW link

Understanding and Aligning a Human-like Inductive Bias with Cognitive Science: a Review of Related Literature

Claire Short29 Jul 2023 6:10 UTC

27 points

0 comments12 min readLW link

Why You Should Never Update Your Beliefs

Arjun Panickssery29 Jul 2023 0:27 UTC

77 points

18 comments4 min readLW link 1 review

(arjunpanickssery.substack.com)