All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 202220232024 2025

All Jan Feb Mar Apr May Jun Jul Aug SepOctNov Dec

All 123 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Revisiting the Manifold Hypothesis

Aidan Rocke1 Oct 2023 23:55 UTC

13 points

19 comments4 min readLW link

AI Alignment Breakthroughs this Week [new substack]

Logan Zoellner1 Oct 2023 22:13 UTC

0 points

8 comments2 min readLW link

[Question] Looking for study

Robert Feinstein1 Oct 2023 19:52 UTC

4 points

0 comments1 min readLW link

Join AISafety.info’s Distillation Hackathon (Oct 6-9th)

smallsilo1 Oct 2023 18:43 UTC

21 points

0 comments2 min readLW link

(forum.effectivealtruism.org)

Fifty Flips

abstractapplic1 Oct 2023 15:30 UTC

33 points

15 comments1 min readLW link 1 review

(h-b-p.github.io)

AI Safety Impact Markets: Your Charity Evaluator for AI Safety

Dawn Drescher1 Oct 2023 10:47 UTC

16 points

5 comments6 min readLW link

(impactmarkets.substack.com)

“Absence of Evidence is Not Evidence of Absence” As a Limit

transhumanist_atom_understander1 Oct 2023 8:15 UTC

16 points

1 comment2 min readLW link

New Tool: the Residual Stream Viewer

AdamYedidia1 Oct 2023 0:49 UTC

32 points

7 comments4 min readLW link

(tinyurl.com)

My Effortless Weightloss Story: A Quick Runthrough

CuoreDiVetro30 Sep 2023 23:02 UTC

124 points

78 comments9 min readLW link

Arguments for moral indefinability

Richard_Ngo30 Sep 2023 22:40 UTC

47 points

16 comments7 min readLW link

(www.thinkingcomplete.com)

Conditionals All The Way Down

lunatic_at_large30 Sep 2023 21:06 UTC

33 points

2 comments3 min readLW link

Focusing your impact on short vs long TAI timelines

kuhanj30 Sep 2023 19:34 UTC

4 points

0 comments10 min readLW link

How model editing could help with the alignment problem

Michael Ripa30 Sep 2023 17:47 UTC

12 points

1 comment15 min readLW link

My submission to the ALTER Prize

Lorxus30 Sep 2023 16:07 UTC

11 points

0 comments1 min readLW link

(www.docdroid.net)

Anki deck for learning the main AI safety orgs, projects, and programs

Bryce Robertson30 Sep 2023 16:06 UTC

2 points

0 comments1 min readLW link

The Lighthaven Campus is open for bookings

habryka30 Sep 2023 1:08 UTC

209 points

18 comments4 min readLW link

(www.lighthaven.space)

Headphones hook

philh29 Sep 2023 22:50 UTC

21 points

1 comment3 min readLW link

(reasonableapproximation.net)

Paul Christiano’s views on “doom” (video explainer)

Michaël Trazzi29 Sep 2023 21:56 UTC

15 points

0 comments1 min readLW link

(youtu.be)

The Retroactive Funding Landscape: Innovations for Donors and Grantmakers

Dawn Drescher29 Sep 2023 17:39 UTC

13 points

0 comments19 min readLW link

(impactmarkets.substack.com)

Bids To Defer On Value Judgements

johnswentworth29 Sep 2023 17:07 UTC

58 points

6 comments3 min readLW link

Announcing FAR Labs, an AI safety coworking space

Ben Goldhaber29 Sep 2023 16:52 UTC

95 points

0 comments1 min readLW link

A tool for searching rationalist & EA webs

Daniel_Friedrich29 Sep 2023 15:23 UTC

4 points

0 comments1 min readLW link

(ratsearch.blogspot.com)

Basic Mathematics of Predictive Coding

Adam Shai29 Sep 2023 14:38 UTC

49 points

6 comments9 min readLW link

“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation

titotal29 Sep 2023 14:01 UTC

160 points

79 comments20 min readLW link

(titotal.substack.com)

Steering subsystems: capabilities, agency, and alignment

Seth Herd29 Sep 2023 13:45 UTC

31 points

0 comments8 min readLW link

Apply to Usable Security Prize by September 30

Allison Duettmann29 Sep 2023 13:39 UTC

4 points

0 comments1 min readLW link

List of how people have become more hard-working

Chi Nguyen29 Sep 2023 11:30 UTC

70 points

7 comments3 min readLW link

Resolving moral uncertainty with randomization

B Jacobs and Jobst Heitzig

29 Sep 2023 11:23 UTC

7 points

1 comment11 min readLW link

EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem

Elizabeth28 Sep 2023 23:30 UTC

324 points

250 comments22 min readLW link 2 reviews

(acesounderglass.com)

Competitive, Cooperative, and Cohabitive

Screwtape28 Sep 2023 23:25 UTC

50 points

13 comments5 min readLW link 1 review

The Coming Wave

PeterMcCluskey28 Sep 2023 22:59 UTC

27 points

1 comment6 min readLW link

(bayesianinvestor.com)

High-level interpretability: detecting an AI’s objectives

Paul Colognese and Jozdien

28 Sep 2023 19:30 UTC

72 points

4 comments21 min readLW link

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

JanB, Owain_Evans and SoerenMind

28 Sep 2023 18:53 UTC

187 points

39 comments3 min readLW link 1 review

Responsible scaling policy TLDR

lemonhope28 Sep 2023 18:51 UTC

9 points

0 comments1 min readLW link

Alignment Workshop talks

Richard_Ngo28 Sep 2023 18:26 UTC

37 points

1 comment1 min readLW link

(www.alignment-workshop.com)

My Current Thoughts on the AI Strategic Landscape

Jeffrey Heninger28 Sep 2023 17:59 UTC

11 points

28 comments14 min readLW link

My Arrogant Plan for Alignment

MrArrogant28 Sep 2023 17:51 UTC

2 points

6 comments6 min readLW link

Discursive Competence in ChatGPT, Part 2: Memory for Texts

Bill Benzon28 Sep 2023 16:34 UTC

1 point

0 comments3 min readLW link

Different views of alignment have different consequences for imperfect methods

Stuart_Armstrong28 Sep 2023 16:31 UTC

31 points

0 comments1 min readLW link

AI #31: It Can Do What Now?

Zvi28 Sep 2023 16:00 UTC

90 points

6 comments40 min readLW link

(thezvi.wordpress.com)

The point of a game is not to win, and you shouldn’t even pretend that it is

mako yass28 Sep 2023 15:54 UTC

52 points

27 comments4 min readLW link

(makopool.com)

Cohabitive Games so Far

mako yass28 Sep 2023 15:41 UTC

132 points

146 comments19 min readLW link 2 reviews

(makopool.com)

Wobbly Table Theorem in Practice

Morpheus28 Sep 2023 14:33 UTC

25 points

0 comments2 min readLW link

Weighing Animal Worth

jefftk28 Sep 2023 13:50 UTC

25 points

11 comments2 min readLW link

(www.jefftk.com)

ARC Evals: Responsible Scaling Policies

Zach Stein-Perlman28 Sep 2023 4:30 UTC

40 points

10 comments2 min readLW link 1 review

(evals.alignment.org)

Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)

Ruby28 Sep 2023 2:48 UTC

66 points

73 comments6 min readLW link

Jimmy Apples, source of the rumor that OpenAI has achieved AGI internally, is a credible insider.

Jorterder28 Sep 2023 1:20 UTC

−6 points

2 comments1 min readLW link

(twitter.com)

Investigating the rumors of OpenAI achieving AGI

Jorterder28 Sep 2023 1:17 UTC

−4 points

1 comment1 min readLW link

Alibaba Group releases Qwen, 14B parameter LLM

Nikola Jurkovic28 Sep 2023 0:12 UTC

5 points

1 comment1 min readLW link

(qianwen-res.oss-cn-beijing.aliyuncs.com)

Metaculus Launches 2023/2024 FluSight Challenge Supporting CDC, $5K in Prizes

ChristianWilliams27 Sep 2023 21:35 UTC

5 points

0 comments1 min readLW link

(www.metaculus.com)