All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 181920 21 22 23 24 25 26 27 28 29 30

hydrogen tube transport

bhauth18 Apr 2024 22:47 UTC

34 points

12 comments5 min readLW link

(www.bhauth.com)

LessOnline Festival Updates Thread

Ben Pace18 Apr 2024 21:55 UTC

59 points

26 comments1 min readLW link

A Review of In-Context Learning Hypotheses for Automated AI Alignment Research

Alfie Lamerton18 Apr 2024 18:29 UTC

25 points

4 comments16 min readLW link

I’m open for projects (sort of)

cousin_it18 Apr 2024 18:05 UTC

47 points

13 comments1 min readLW link

Blessed information, garbage information, cursed information

tailcalled18 Apr 2024 16:56 UTC

23 points

8 comments3 min readLW link

[Fiction] A Confession

Arjun Panickssery18 Apr 2024 16:28 UTC

38 points

2 comments5 min readLW link

(arjunpanickssery.substack.com)

Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight

Sam Marks18 Apr 2024 16:17 UTC

117 points

10 comments12 min readLW link

Cooperation is optimal, with weaker agents too - tldr

Ryo 18 Apr 2024 15:03 UTC

13 points

22 comments4 min readLW link

(medium.com)

How to coordinate despite our biases? - tldr

Ryo 18 Apr 2024 15:03 UTC

3 points

2 comments3 min readLW link

(medium.com)

Knowledge Base 7: Long-tail knowledge and collective intelligence

iwis18 Apr 2024 14:21 UTC

−6 points

0 comments1 min readLW link

AI #60: Oh the Humanity

Zvi18 Apr 2024 14:10 UTC

44 points

7 comments62 min readLW link

(thezvi.wordpress.com)

UDT1.01: Logical Inductors and Implicit Beliefs (5/10)

Diffractor18 Apr 2024 8:39 UTC

36 points

2 comments19 min readLW link

An examination of GPT-2′s boring yet effective glitch

MiguelDev18 Apr 2024 5:26 UTC

5 points

3 comments3 min readLW link

[Question] What if Ethics is Provably Self-Contradictory?

Yitz18 Apr 2024 5:12 UTC

3 points

7 comments2 min readLW link

The Mom Test: Summary and Thoughts

Biff Wiff18 Apr 2024 3:34 UTC

51 points

3 comments10 min readLW link

Express interest in an “FHI of the West”

habryka18 Apr 2024 3:32 UTC

268 points

41 comments3 min readLW link

Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer

johnswentworth and David Lorell

18 Apr 2024 0:27 UTC

190 points

21 comments7 min readLW link

AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil

DanielFilan17 Apr 2024 21:42 UTC

12 points

0 comments65 min readLW link

LLM Evaluators Recognize and Favor Their Own Generations

Arjun Panickssery, Sam Bowman and Shi

17 Apr 2024 21:09 UTC

52 points

1 comment3 min readLW link

(tiny.cc)

SFS: Foundations of Forecasting

MAD217 Apr 2024 17:46 UTC

3 points

0 comments1 min readLW link

An ethical framework to supersede Utilitarianism

metalcrow17 Apr 2024 17:18 UTC

1 point

4 comments4 min readLW link

Moving on from community living

Vika17 Apr 2024 17:02 UTC

64 points

7 comments3 min readLW link

(vkrakovna.wordpress.com)

Staged release

Zach Stein-Perlman17 Apr 2024 16:00 UTC

11 points

4 comments2 min readLW link

[Question] Discomfort Stacking

Lewis O’Brien17 Apr 2024 14:49 UTC

5 points

12 comments1 min readLW link

FHI (Future of Humanity Institute) has shut down (2005–2024)

gwern17 Apr 2024 13:54 UTC

176 points

22 comments1 min readLW link

(www.futureofhumanityinstitute.org)

Childhood and Education Roundup #5

Zvi17 Apr 2024 13:00 UTC

36 points

3 comments25 min readLW link

(thezvi.wordpress.com)

Should we maximize the Geometric Expectation of Utility?

A.H.17 Apr 2024 10:37 UTC

5 points

17 comments9 min readLW link

Claude 3 Opus can operate as a Turing machine

Gunnar_Zarncke17 Apr 2024 8:41 UTC

37 points

2 comments1 min readLW link

(twitter.com)

When is a mind me?

Rob Bensinger17 Apr 2024 5:56 UTC

148 points

134 comments15 min readLW link

Mid-conditional love

KatjaGrace17 Apr 2024 4:00 UTC

76 points

21 comments2 min readLW link

(worldspiritsockpuppet.com)

Spending Update 2024

jefftk17 Apr 2024 2:30 UTC

20 points

2 comments3 min readLW link

(www.jefftk.com)

Anti MMAcevedo Protocol

Logan Zoellner16 Apr 2024 22:32 UTC

1 point

1 comment8 min readLW link

Transformers Represent Belief State Geometry in their Residual Stream

Adam Shai16 Apr 2024 21:16 UTC

442 points

103 comments12 min readLW link 1 review

Tinker

Richard_Ngo16 Apr 2024 18:26 UTC

39 points

0 comments1 min readLW link

(press.asimov.com)

Paul Christiano named as US AI Safety Institute Head of AI Safety

Joel Burget16 Apr 2024 16:22 UTC

257 points

61 comments1 min readLW link

(www.commerce.gov)

Creating unrestricted AI Agents with Command R+

Simon Lermen16 Apr 2024 14:52 UTC

77 points

13 comments5 min readLW link

What should the EA community learn from the FTX / SBF disaster? An in-depth discussion with Will MacAskill on the Clearer Thinking podcast

spencerg16 Apr 2024 13:11 UTC

20 points

0 comments1 min readLW link

(podcast.clearerthinking.org)

{Book Summary} The Art of Gathering

T_W16 Apr 2024 10:48 UTC

28 points

0 comments13 min readLW link

Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes

owencb and AI Impacts

16 Apr 2024 10:10 UTC

84 points

17 comments8 min readLW link

(blog.aiimpacts.org)

Announcing SPAR Summer 2024!

laurenmarie16 Apr 2024 8:30 UTC

30 points

2 comments1 min readLW link

The argument for near-term human disempowerment through AI

Chris_Leong16 Apr 2024 4:50 UTC

22 points

2 comments1 min readLW link

(link.springer.com)

My experience using financial commitments to overcome akrasia

Will_Howard15 Apr 2024 22:57 UTC

141 points

39 comments18 min readLW link 1 review

An evaluation of circuit evaluation metrics

Ivan Arcuschin, Niels uit de Bos and Adrià Garriga-alonso

15 Apr 2024 19:38 UTC

18 points

0 comments4 min readLW link

Experiments with an alternative method to promote sparsity in sparse autoencoders

Eoin Farrell15 Apr 2024 18:21 UTC

29 points

7 comments12 min readLW link

Effectively Handling Disagreements—Introducing a New Workshop

Camille B. 15 Apr 2024 16:33 UTC

37 points

2 comments7 min readLW link

Four Local Gigs

jefftk15 Apr 2024 16:00 UTC

8 points

0 comments1 min readLW link

(www.jefftk.com)

Taking into account preferences of past selves

Jacob G-W15 Apr 2024 13:15 UTC

14 points

9 comments7 min readLW link

Monthly Roundup #17: April 2024

Zvi15 Apr 2024 12:10 UTC

54 points

4 comments76 min readLW link

(thezvi.wordpress.com)

Reconsider the anti-cavity bacteria if you are Asian

Lao Mein15 Apr 2024 7:02 UTC

174 points

43 comments4 min readLW link

Anthropic AI made the right call

bhauth15 Apr 2024 0:39 UTC

22 points

20 comments1 min readLW link