All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28

arch-anarchist reading list

Peter lawless 16 Feb 2025 22:47 UTC

2 points

1 comment1 min readLW link

CyberEconomy. The Limits to Growth

Timur Sadekov and Aleksei Vostriakov

16 Feb 2025 21:02 UTC

−3 points

0 comments23 min readLW link

Cooperation for AI safety must transcend geopolitical interference

Matrice Jacobine16 Feb 2025 18:18 UTC

7 points

6 comments1 min readLW link

(www.scmp.com)

[Question] Programming Language Early Funding?

J Thomas Moros16 Feb 2025 17:34 UTC

2 points

6 comments3 min readLW link

[Closed] Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme

Vanessa Kosoy16 Feb 2025 16:24 UTC

54 points

5 comments2 min readLW link

Celtic Knots on Einstein Lattice

Ben16 Feb 2025 15:56 UTC

47 points

11 comments2 min readLW link

It’s been ten years. I propose HPMOR Anniversary Parties.

Screwtape16 Feb 2025 1:43 UTC

153 points

3 comments1 min readLW link

Come join Dovetail’s agent foundations fellowship talks & discussion

Alex_Altair15 Feb 2025 22:10 UTC

24 points

0 comments1 min readLW link

Quantifying the Qualitative: Towards a Bayesian Approach to Personal Insight

Pruthvi Kumar15 Feb 2025 19:50 UTC

1 point

0 comments6 min readLW link

Knitting a Sweater in a Burning House

CrimsonChin15 Feb 2025 19:50 UTC

27 points

2 comments2 min readLW link

Microplastics: Much Less Than You Wanted To Know

jenn, kaleb and Brent

15 Feb 2025 19:08 UTC

94 points

10 comments13 min readLW link

Preference for uncertainty and impact overestimation bias in altruistic systems.

Luck15 Feb 2025 12:27 UTC

1 point

0 comments1 min readLW link

Artificial Static Place Intelligence: Guaranteed Alignment

ank15 Feb 2025 11:08 UTC

2 points

2 comments2 min readLW link

The current AI strategic landscape: one bear’s perspective

Matrice Jacobine15 Feb 2025 9:49 UTC

11 points

0 comments2 min readLW link

(philosophybear.substack.com)

6 (Potential) Misconceptions about AI Intellectuals

ozziegooen14 Feb 2025 23:51 UTC

18 points

11 comments12 min readLW link

[Question] Should Open Philanthropy Make an Offer to Buy OpenAI?

peterr14 Feb 2025 23:18 UTC

25 points

1 comment1 min readLW link

A computational no-coincidence principle

Eric Neyman14 Feb 2025 21:39 UTC

149 points

40 comments6 min readLW link

(www.alignment.org)

Hopeful hypothesis, the Persona Jukebox.

Donald Hobson14 Feb 2025 19:24 UTC

11 points

4 comments3 min readLW link

Introduction to Expected Value Fanaticism

Petra Kosonen14 Feb 2025 19:05 UTC

9 points

8 comments1 min readLW link

(utilitarianism.net)

Intrinsic Dimension of Prompts in LLMs

Karthik Viswanathan14 Feb 2025 19:02 UTC

3 points

0 comments4 min readLW link

Objective Realism: A Perspective Beyond Human Constructs

Apatheos14 Feb 2025 19:02 UTC

−12 points

1 comment2 min readLW link

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

14 Feb 2025 15:43 UTC

105 points

2 comments1 min readLW link

(deepmindsafetyresearch.medium.com)

The Mask Comes Off: A Trio of Tales

Zvi14 Feb 2025 15:30 UTC

81 points

1 comment13 min readLW link

(thezvi.wordpress.com)

Celtic Knots on a hex lattice

Ben14 Feb 2025 14:29 UTC

27 points

10 comments2 min readLW link

Bimodal AI Beliefs

Adam Train14 Feb 2025 6:45 UTC

6 points

1 comment4 min readLW link

What is a circuit? [in interpretability]

Yudhister Kumar14 Feb 2025 4:40 UTC

23 points

1 comment1 min readLW link

Systematic Sandbagging Evaluations on Claude 3.5 Sonnet

farrelmahaztra14 Feb 2025 1:22 UTC

13 points

0 comments1 min readLW link

(farrelmahaztra.com)

Paranoia, Cognitive Biases, and Catastrophic Thought Patterns.

Spiritus Dei14 Feb 2025 0:13 UTC

−4 points

1 comment6 min readLW link

Notes on the Presidential Election of 1836

Arjun Panickssery13 Feb 2025 23:40 UTC

23 points

0 comments7 min readLW link

(arjunpanickssery.substack.com)

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ank13 Feb 2025 22:35 UTC

1 point

2 comments11 min readLW link

I’m making a ttrpg about life in an intentional community during the last year before the Singularity

bgaesop13 Feb 2025 21:54 UTC

11 points

2 comments2 min readLW link

SWE Automation Is Coming: Consider Selling Your Crypto

A_donor13 Feb 2025 20:17 UTC

12 points

8 comments1 min readLW link

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Rafael Harth13 Feb 2025 19:21 UTC

53 points

67 comments15 min readLW link

System 2 Alignment: Deliberation, Review, and Thought Management

Seth Herd13 Feb 2025 19:17 UTC

39 points

0 comments22 min readLW link

Murder plots are infohazards

Chris Monteiro13 Feb 2025 19:15 UTC

303 points

46 comments2 min readLW link

Sparse Autoencoder Feature Ablation for Unlearning

aludert13 Feb 2025 19:13 UTC

3 points

0 comments11 min readLW link

What is it to solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:42 UTC

31 points

6 comments19 min readLW link

(joecarlsmith.substack.com)

Self-dialogue: Do behaviorist rewards make scheming AGIs?

Steven Byrnes13 Feb 2025 18:39 UTC

43 points

1 comment46 min readLW link

How do we solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:27 UTC

69 points

9 comments7 min readLW link

(joecarlsmith.substack.com)

Ambiguous out-of-distribution generalization on an algorithmic task

Wilson Wu and Louis Jaburi

13 Feb 2025 18:24 UTC

84 points

6 comments11 min readLW link

Teaching AI to reason: this year’s most important story

Benjamin_Todd13 Feb 2025 17:40 UTC

10 points

0 comments10 min readLW link

(benjamintodd.substack.com)

AI #103: Show Me the Money

Zvi13 Feb 2025 15:20 UTC

30 points

9 comments58 min readLW link

(thezvi.wordpress.com)

OpenAI’s NSFW policy: user safety, harm reduction, and AI consent

8e913 Feb 2025 13:59 UTC

4 points

3 comments2 min readLW link

Studies of Human Error Rate

tin48213 Feb 2025 13:43 UTC

15 points

3 comments1 min readLW link

the dumbest theory of everything

lostinwilliamsburg13 Feb 2025 7:57 UTC

−1 points

0 comments7 min readLW link

Skepticism towards claims about the views of powerful institutions

tlevin13 Feb 2025 7:40 UTC

46 points

2 comments4 min readLW link

Virtue signaling, and the “humans-are-wonderful” bias, as a trust exercise

lc13 Feb 2025 6:59 UTC

46 points

16 comments4 min readLW link

My model of what is going on with LLMs

Cole Wyeth13 Feb 2025 3:43 UTC

110 points

49 comments7 min readLW link

Not all capabilities will be created equal: focus on strategically superhuman agents

benwr13 Feb 2025 1:24 UTC

62 points

9 comments3 min readLW link

LLMs can teach themselves to better predict the future

Ben Turtel13 Feb 2025 1:01 UTC

0 points

1 comment1 min readLW link

(arxiv.org)