All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?

Rachel ShuJun 25, 2024, 1:35 AM

46 points

9 comments3 min readLW link

Rational Animations’ intro to mechanistic interpretability

WriterJun 14, 2024, 4:10 PM

45 points

1 comment11 min readLW link

(youtu.be)

AI governance needs a theory of victory

Corin Katzke and Justin Bullock

Jun 21, 2024, 4:15 PM

45 points

8 comments LW link

(www.convergenceanalysis.org)

Sci-Fi books micro-reviews

Yair HalberstadtJun 24, 2024, 9:49 AM

44 points

27 comments4 min readLW link

Debate, Oracles, and Obfuscated Arguments

Jonah Brown-Cohen and Geoffrey Irving

Jun 20, 2024, 11:14 PM

44 points

4 comments21 min readLW link

Soviet comedy film recommendations

Nina PanicksseryJun 9, 2024, 11:40 PM

42 points

11 comments2 min readLW link

(open.substack.com)

D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues

aphyerJun 7, 2024, 7:02 PM

42 points

16 comments3 min readLW link

Case studies on social-welfare-based standards in various industries

HoldenKarnofskyJun 20, 2024, 1:33 PM

42 points

0 comments LW link

When fine-tuning fails to elicit GPT-3.5′s chess abilities

Theodore ChapmanJun 14, 2024, 6:50 PM

42 points

3 comments9 min readLW link

Jailbreak steering generalization

Sarah Ball and Nina Panickssery

Jun 20, 2024, 5:25 PM

41 points

4 comments2 min readLW link

(arxiv.org)

Book review: The Quincunx

cousin_itJun 5, 2024, 9:13 PM

41 points

12 comments2 min readLW link

Surviving Seveneves

Yair HalberstadtJun 19, 2024, 1:11 PM

41 points

4 comments11 min readLW link

Applying Force to the Wrong End of a Causal Chain

silentbobJun 22, 2024, 6:06 PM

41 points

0 comments9 min readLW link

Beyond the Board: Exploring AI Robustness Through Go

AdamGleaveJun 19, 2024, 4:40 PM

41 points

2 comments1 min readLW link

(far.ai)

Long-Term Future Fund: May 2023 to March 2024 Payout recommendations

LinchJun 12, 2024, 1:46 PM

40 points

0 comments LW link

Progress Conference 2024: Toward Abundant Futures

jasoncrawfordJun 26, 2024, 3:39 PM

40 points

2 comments1 min readLW link

(rootsofprogress.org)

The Data Wall is Important

JustisMillsJun 9, 2024, 10:54 PM

40 points

20 comments2 min readLW link

(justismills.substack.com)

Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.

Josh LevyJun 4, 2024, 3:45 PM

39 points

0 comments18 min readLW link

AI #70: A Beautiful Sonnet

ZviJun 27, 2024, 2:40 PM

38 points

0 comments44 min readLW link

(thezvi.wordpress.com)

(Appetitive, Consummatory) ≈ (RL, reflex)

Steven ByrnesJun 15, 2024, 3:57 PM

38 points

1 comment3 min readLW link

On DeepMind’s Frontier Safety Framework

ZviJun 18, 2024, 1:30 PM

37 points

4 comments8 min readLW link

(thezvi.wordpress.com)

Searching for the Root of the Tree of Evil

Ivan VendrovJun 8, 2024, 5:05 PM

36 points

14 comments5 min readLW link

(nothinghuman.substack.com)

Representation Tuning

Christopher AckermanJun 27, 2024, 5:44 PM

35 points

9 comments13 min readLW link

Empirical vs. Mathematical Joints of Nature

Elizabeth and Alex_Altair

Jun 26, 2024, 1:55 AM

35 points

1 comment5 min readLW link

OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors

Joel BurgetJun 13, 2024, 9:28 PM

35 points

10 comments1 min readLW link

(openai.com)

Suffering Is Not Pain

jbkjrJun 18, 2024, 6:04 PM

34 points

45 comments5 min readLW link

(jbkjr.me)

GPT2, Five Years On

Joel BurgetJun 5, 2024, 5:44 PM

34 points

0 comments3 min readLW link

(importai.substack.com)

AXRP Episode 33 - RLHF Problems with Scott Emmons

DanielFilanJun 12, 2024, 3:30 AM

34 points

0 comments56 min readLW link

Attention Output SAEs Improve Circuit Analysis

Connor Kissane, robertzk, Arthur Conmy and Neel Nanda

Jun 21, 2024, 12:56 PM

33 points

3 comments19 min readLW link

Book review: the Iliad

philhJun 18, 2024, 6:50 PM

31 points

2 comments14 min readLW link

(reasonableapproximation.net)

Incentive Learning vs Dead Sea Salt Experiment

Steven ByrnesJun 25, 2024, 5:49 PM

30 points

1 comment28 min readLW link

5. Open Corrigibility Questions

Max HarmsJun 10, 2024, 2:09 PM

30 points

0 comments7 min readLW link

“Full Automation” is a Slippery Metric

ozziegooenJun 11, 2024, 7:56 PM

30 points

1 comment LW link

[Question] What are things you’re allowed to do as a startup?

ElizabethJun 20, 2024, 12:01 AM

30 points

9 comments1 min readLW link

A Case for Superhuman Governance, using AI

ozziegooenJun 7, 2024, 12:10 AM

30 points

0 comments LW link

DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking

tailcalledJun 10, 2024, 9:20 PM

29 points

13 comments2 min readLW link

Aggregative Principles of Social Justice

Cleo NardoJun 5, 2024, 1:44 PM

29 points

10 comments37 min readLW link

Offering Completion

jefftkJun 7, 2024, 1:40 AM

29 points

6 comments1 min readLW link

(www.jefftk.com)

Evaporation of improvements

ViliamJun 20, 2024, 6:34 PM

29 points

27 comments2 min readLW link

Childhood and Education Roundup #6: College Edition

ZviJun 26, 2024, 11:40 AM

28 points

8 comments23 min readLW link

(thezvi.wordpress.com)

Aggregative principles approximate utilitarian principles

Cleo NardoJun 12, 2024, 4:27 PM

28 points

3 comments23 min readLW link

Monthly Roundup #19: June 2024

ZviJun 25, 2024, 12:00 PM

28 points

9 comments54 min readLW link

(thezvi.wordpress.com)

Probably Not a Ghost Story

George IngebretsenJun 12, 2024, 10:55 PM

27 points

4 comments3 min readLW link

Appraising aggregativism and utilitarianism

Cleo NardoJun 21, 2024, 11:10 PM

27 points

10 comments19 min readLW link

An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs

Adam KarvonenJun 25, 2024, 3:57 PM

27 points

0 comments9 min readLW link

(adamkarvonen.github.io)

Sticker Shortcut Fallacy — The Real Worst Argument in the World

ymeskhoutJun 12, 2024, 2:52 PM

27 points

15 comments4 min readLW link

(www.ymeskhout.com)

my favourite Scott Sumner blog posts

DMMFJun 11, 2024, 2:40 PM

26 points

0 comments3 min readLW link

(danfrank.ca)

[Question] Thoughts on Francois Chollet’s belief that LLMs are far away from AGI?

O OJun 14, 2024, 6:32 AM

26 points

17 comments1 min readLW link

Talk: AI safety fieldbuilding at MATS

Ryan KiddJun 23, 2024, 11:06 PM

26 points

2 comments10 min readLW link

3b. Formal (Faux) Corrigibility

Max HarmsJun 9, 2024, 5:18 PM

26 points

13 comments17 min readLW link