All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 202320242025 2026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 252627 28 29 30

In favour of exploring nagging doubts about x-risk

owencb25 Jun 2024 23:52 UTC

107 points

2 comments2 min readLW link

What is a Tool?

johnswentworth and David Lorell

25 Jun 2024 23:40 UTC

67 points

4 comments6 min readLW link

Compute Governance Literature Review

sijarvis25 Jun 2024 22:41 UTC

11 points

0 comments13 min readLW link

Computational Complexity as an Intuition Pump for LLM Generality

Ari Brill25 Jun 2024 20:25 UTC

18 points

6 comments3 min readLW link

Failure Modes of Teaching AI Safety

Eleni Angelou25 Jun 2024 19:07 UTC

20 points

0 comments1 min readLW link

Kingfisher Summer Tour 2024

jefftk25 Jun 2024 18:50 UTC

9 points

0 comments1 min readLW link

(www.jefftk.com)

Incentive Learning vs Dead Sea Salt Experiment

Steven Byrnes25 Jun 2024 17:49 UTC

33 points

2 comments29 min readLW link

An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs

Adam Karvonen25 Jun 2024 15:57 UTC

30 points

0 comments9 min readLW link

(adamkarvonen.github.io)

Formal verification, heuristic explanations and surprise accounting

Jacob_Hilton25 Jun 2024 15:40 UTC

168 points

11 comments9 min readLW link

(www.alignment.org)

Metastrategy get-started guide

Tahp25 Jun 2024 15:04 UTC

6 points

1 comment8 min readLW link

Labor Participation is an Alignment Risk

alex25 Jun 2024 14:15 UTC

−5 points

2 comments17 min readLW link

Monthly Roundup #19: June 2024

Zvi25 Jun 2024 12:00 UTC

28 points

9 comments54 min readLW link

(thezvi.wordpress.com)

Regularly meta-optimization

Crazy philosopher25 Jun 2024 6:12 UTC

−4 points

6 comments1 min readLW link

Memetics as an analogy and its implicit connotations

Rachel Shu25 Jun 2024 5:13 UTC

4 points

0 comments3 min readLW link

Mistakes people make when thinking about units

Isaac King25 Jun 2024 3:39 UTC

74 points

14 comments7 min readLW link

Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?

Rachel Shu25 Jun 2024 1:35 UTC

46 points

9 comments3 min readLW link

I’m a bit skeptical of AlphaFold 3

Oleg Trott25 Jun 2024 0:04 UTC

87 points

14 comments2 min readLW link

Being hella lost as rationality practice

Rachel Shu24 Jun 2024 23:50 UTC

14 points

0 comments2 min readLW link

A Basic Economics-Style Model of AI Existential Risk

Rubi J. Hudson24 Jun 2024 20:26 UTC

24 points

3 comments7 min readLW link

The Minority Coalition

Richard_Ngo24 Jun 2024 20:01 UTC

103 points

9 comments5 min readLW link

(www.narrativeark.xyz)

Compact Proofs of Model Performance via Mechanistic Interpretability

LawrenceC, rajashree, Adrià Garriga-alonso and Jason Gross

24 Jun 2024 19:27 UTC

104 points

4 comments8 min readLW link

(arxiv.org)

Contrapositive Natural Abstraction—Project Intro

Elliot Callender24 Jun 2024 18:37 UTC

4 points

5 comments2 min readLW link

Sparse Features Through Time

Rogan Inglis24 Jun 2024 18:06 UTC

12 points

1 comment1 min readLW link

(roganinglis.io)

PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment

rpglover6424 Jun 2024 17:53 UTC

18 points

1 comment3 min readLW link

Paying Russians to not invade Ukraine

djColliderBias24 Jun 2024 17:46 UTC

9 points

7 comments3 min readLW link

SAE feature geometry is outside the superposition hypothesis

jake_mendel24 Jun 2024 16:07 UTC

229 points

18 comments11 min readLW link 1 review

So you want to work on technical AI safety

gw24 Jun 2024 14:29 UTC

53 points

3 comments14 min readLW link

The Future of Work: How Can Policymakers Prepare for AI’s Impact on Labor Markets?

davidconrad, Arturs and Tillman Schenk

24 Jun 2024 14:18 UTC

5 points

0 comments3 min readLW link

LLM Generality is a Timeline Crux

eggsyntax24 Jun 2024 12:52 UTC

219 points

121 comments8 min readLW link 1 review

On Claude 3.5 Sonnet

Zvi24 Jun 2024 12:00 UTC

95 points

14 comments13 min readLW link

(thezvi.wordpress.com)

Book Review: Righteous Victims—A History of the Zionist-Arab Conflict

Yair Halberstadt24 Jun 2024 11:02 UTC

54 points

8 comments34 min readLW link

The Living Planet Index: A Case Study in Statistical Pitfalls

Jan_Kulveit24 Jun 2024 10:05 UTC

25 points

0 comments4 min readLW link

(www.nature.com)

Sci-Fi books micro-reviews

Yair Halberstadt24 Jun 2024 9:49 UTC

45 points

27 comments4 min readLW link

A Step Against Land Value Tax

Blog Alt24 Jun 2024 5:13 UTC

9 points

23 comments6 min readLW link

(antematters.substack.com)

Different senses in which two AIs can be “the same”

Vivek Hebbar and Buck

24 Jun 2024 3:16 UTC

83 points

3 comments4 min readLW link 1 review

Talk: AI safety fieldbuilding at MATS

Ryan Kidd23 Jun 2024 23:06 UTC

26 points

2 comments10 min readLW link

AI Labs Wouldn’t be Convicted of Treason or Sedition

Matthew Khoriaty23 Jun 2024 21:34 UTC

13 points

2 comments3 min readLW link

Control Vectors as Dispositional Traits

Gianluca Calcagni23 Jun 2024 21:34 UTC

11 points

0 comments12 min readLW link

“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Claude 2024 [humor]

gwern23 Jun 2024 21:18 UTC

22 points

6 comments1 min readLW link

(gwern.net)

[Question] How are you preparing for the possibility of an AI bust?

Nate Showell23 Jun 2024 19:13 UTC

26 points

16 comments1 min readLW link

A simple text status can change something

nextcaller23 Jun 2024 18:48 UTC

5 points

0 comments2 min readLW link

35 Interactive Learning Modules Relevant to EAs / Effective Altruism (that are all free)

spencerg23 Jun 2024 17:57 UTC

6 points

0 comments3 min readLW link

Podcasts: AGI Show, Consistently Candid, London Futurists

KatjaGrace23 Jun 2024 13:50 UTC

16 points

0 comments1 min readLW link

(worldspiritsockpuppet.com)

Text Posts from the Kids Group: 2019

jefftk23 Jun 2024 13:20 UTC

23 points

0 comments18 min readLW link

(www.jefftk.com)

Population ethics and the value of variety

cousin_it23 Jun 2024 10:42 UTC

25 points

11 comments2 min readLW link

[Question] Karma votes: blind to or accounting for score?

cata22 Jun 2024 21:40 UTC

20 points

4 comments1 min readLW link

[Question] Should effective altruism be more “cool”?

jaredmantell22 Jun 2024 20:42 UTC

3 points

3 comments1 min readLW link

AI as a computing platform: what to expect

Jonasb22 Jun 2024 19:55 UTC

−3 points

0 comments7 min readLW link

(www.denominations.io)

Expected number of tries

adios22 Jun 2024 19:22 UTC

6 points

0 comments2 min readLW link

Applying Force to the Wrong End of a Causal Chain

silentbob22 Jun 2024 18:06 UTC

41 points

0 comments9 min readLW link