All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All JanFebMar Apr May Jun Jul Aug Sep Oct

All 1 2 3 4 567 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

C’mon guys, Deliberate Practice is Real

Raemon5 Feb 2025 22:33 UTC

99 points

25 comments9 min readLW link

The Risk of Gradual Disempowerment from AI

Zvi5 Feb 2025 22:10 UTC

87 points

20 comments20 min readLW link

(thezvi.wordpress.com)

Wired on: “DOGE personnel with admin access to Federal Payment System”

Raemon5 Feb 2025 21:32 UTC

88 points

45 comments2 min readLW link

(web.archive.org)

On AI Scaling

harsimony5 Feb 2025 20:24 UTC

6 points

3 comments8 min readLW link

(splittinginfinity.substack.com)

The State of Metaculus

ChristianWilliams5 Feb 2025 19:17 UTC

21 points

0 comments6 min readLW link

(www.metaculus.com)

Post-hoc reasoning in chain of thought

Kyle Cox5 Feb 2025 18:58 UTC

19 points

0 comments11 min readLW link

DeepSeek-R1 for Beginners

Anton Razzhigaev5 Feb 2025 18:58 UTC

13 points

0 comments8 min readLW link

Making the case for average-case AI Control

Nathaniel Mitrani5 Feb 2025 18:56 UTC

4 points

0 comments5 min readLW link

[Question] Alignment Paradox and a Request for Harsh Criticism

Bridgett Kay5 Feb 2025 18:17 UTC

6 points

7 comments1 min readLW link

Introducing International AI Governance Alliance (IAIGA)

jamesnorris5 Feb 2025 16:02 UTC

7 points

0 comments1 min readLW link

Introducing Collective Action for Existential Safety: 80+ actions individuals, organizations, and nations can take to improve our existential safety

jamesnorris5 Feb 2025 16:02 UTC

−9 points

2 comments1 min readLW link

Language Models Use Trigonometry to Do Addition

Subhash Kantamneni5 Feb 2025 13:50 UTC

76 points

1 comment10 min readLW link

Deploying the Observer will save humanity from existential threats

Aram Panasenco5 Feb 2025 10:39 UTC

−11 points

8 comments1 min readLW link

The Domain of Orthogonality

mgfcatherall5 Feb 2025 8:14 UTC

1 point

0 comments7 min readLW link

Reviewing LessWrong: Screwtape’s Basic Answer

Screwtape5 Feb 2025 4:30 UTC

97 points

18 comments6 min readLW link

[Question] Why isn’t AI containment the primary AI safety strategy?

Oliver Kuperman5 Feb 2025 3:54 UTC

1 point

3 comments3 min readLW link

Nick Land: Orthogonality

lumpenspace4 Feb 2025 21:07 UTC

5 points

37 comments8 min readLW link

What working on AI safety taught me about B2B SaaS sales

purple fire4 Feb 2025 20:50 UTC

7 points

12 comments5 min readLW link

Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker

Daniel Herrmann, Aydin Mohseni and ben_levinstein

4 Feb 2025 20:34 UTC

45 points

22 comments5 min readLW link

Anti-Slop Interventions?

abramdemski4 Feb 2025 19:50 UTC

76 points

33 comments6 min readLW link

Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails

Devina Jain4 Feb 2025 19:10 UTC

9 points

0 comments10 min readLW link

[Question] Journalism student looking for sources

pinkerton4 Feb 2025 18:58 UTC

11 points

3 comments1 min readLW link

We’re in Deep Research

Zvi4 Feb 2025 17:20 UTC

45 points

3 comments20 min readLW link

(thezvi.wordpress.com)

The Capitalist Agent

henophilia4 Feb 2025 15:32 UTC

1 point

10 comments3 min readLW link

(blog.hermesloom.org)

Forecasting AGI: Insights from Prediction Markets and Metaculus

Alvin Ånestrand4 Feb 2025 13:03 UTC

13 points

0 comments4 min readLW link

(forecastingaifutures.substack.com)

Ruling Out Lookup Tables

Alfred Harwood4 Feb 2025 10:39 UTC

22 points

11 comments7 min readLW link

Half-baked idea: a straightforward method for learning environmental goals?

Q Home4 Feb 2025 6:56 UTC

16 points

7 comments5 min readLW link

Information Versus Action

Screwtape4 Feb 2025 5:13 UTC

27 points

0 comments6 min readLW link

Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method

Clément L4 Feb 2025 4:15 UTC

6 points

1 comment13 min readLW link

Tear Down the Burren

jefftk4 Feb 2025 3:40 UTC

45 points

2 comments2 min readLW link

(www.jefftk.com)

Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)

Archimedes4 Feb 2025 2:55 UTC

17 points

1 comment1 min readLW link

(www.anthropic.com)

Can someone, anyone, make superintelligence a more concrete concept?

Ori Nagel4 Feb 2025 2:18 UTC

2 points

8 comments5 min readLW link

What are the “no free lunch” theorems?

Vishakha and Algon

4 Feb 2025 2:02 UTC

19 points

4 comments1 min readLW link

(aisafety.info)

eliminating bias through language?

KvmanThinking4 Feb 2025 1:52 UTC

1 point

12 comments1 min readLW link

New Foresight Longevity Bio & Molecular Nano Grants Program

Allison Duettmann4 Feb 2025 0:28 UTC

11 points

0 comments1 min readLW link

Meta: Frontier AI Framework

Zach Stein-Perlman3 Feb 2025 22:00 UTC

33 points

2 comments1 min readLW link

(ai.meta.com)

$300 Fermi Model Competition

ozziegooen3 Feb 2025 19:47 UTC

16 points

18 comments2 min readLW link

Visualizing Interpretability

Darold Davis3 Feb 2025 19:36 UTC

2 points

0 comments4 min readLW link

Alignment Can Reduce Performance on Simple Ethical Questions

Daan Henselmans3 Feb 2025 19:35 UTC

16 points

7 comments6 min readLW link

The Overlap Paradigm: Rethinking Data’s Role in Weak-to-Strong Generalization (W2SG)

Serhii Zamrii3 Feb 2025 19:31 UTC

2 points

0 comments11 min readLW link

Sleeper agents appear resilient to activation steering

Lucy Wingard3 Feb 2025 19:31 UTC

6 points

0 comments7 min readLW link

Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP

Gilber A. Corrales3 Feb 2025 19:30 UTC

1 point

0 comments13 min readLW link

Superintelligence Alignment Proposal

Davey Morse3 Feb 2025 18:47 UTC

5 points

3 comments9 min readLW link

The Self-Reference Trap in Mathematics

Alister Munday3 Feb 2025 16:12 UTC

−41 points

23 comments2 min readLW link

Stopping unaligned LLMs is easy!

Yair Halberstadt3 Feb 2025 15:38 UTC

−3 points

11 comments2 min readLW link

The Outer Levels

Jerdle3 Feb 2025 14:30 UTC

2 points

3 comments6 min readLW link

o3-mini Early Days

Zvi3 Feb 2025 14:20 UTC

45 points

0 comments15 min readLW link

(thezvi.wordpress.com)

OpenAI releases deep research agent

Seth Herd3 Feb 2025 12:48 UTC

78 points

21 comments3 min readLW link

(openai.com)

Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space

Roman Malov3 Feb 2025 10:30 UTC

5 points

0 comments2 min readLW link

[Question] Can we infer the search space of a local optimiser?

Lucius Bushnaq3 Feb 2025 10:17 UTC

25 points

5 comments3 min readLW link