17 Feb 2025 23:13 UTC

57 points

9 comments12 min readLW link

AGI Safety & Alignment @ Google DeepMind is hiring

Rohin Shah17 Feb 2025 21:11 UTC

103 points

19 comments10 min readLW link

The Peeperi (unfinished) - By Katja Grace

Nathan Young17 Feb 2025 19:33 UTC

22 points

0 comments3 min readLW link

(docs.google.com)

Progress links and short notes, 2025-02-17

jasoncrawford17 Feb 2025 19:18 UTC

8 points

0 comments7 min readLW link

(newsletter.rootsofprogress.org)

Claude 3.5 Sonnet (New)’s AGI scenario

Nathan Young17 Feb 2025 18:47 UTC

5 points

2 comments5 min readLW link

Talking to laymen about AI development

David Steel17 Feb 2025 18:42 UTC

8 points

0 comments1 min readLW link

On the Rebirth of Aristocracy in the American Regime

shawkisukkar17 Feb 2025 16:18 UTC

−16 points

3 comments9 min readLW link

(shawkisukkar.substack.com)

Ascetic hedonism

dkl917 Feb 2025 15:56 UTC

15 points

9 comments2 min readLW link

(dkl9.net)

AIS Berlin, events, opportunities and the flipped gameboard—Fieldbuilders Newsletter, February 2025

gergogaspar and ENAIS

17 Feb 2025 14:16 UTC

6 points

0 comments3 min readLW link

Monthly Roundup #27: February 2025

Zvi17 Feb 2025 14:10 UTC

27 points

3 comments44 min readLW link

(thezvi.wordpress.com)

What new x- or s-risk fieldbuilding organisations would you like to see? An EOI form. (FBB #3)

gergogaspar17 Feb 2025 12:39 UTC

6 points

0 comments2 min readLW link

A History of the Future, 2025-2040

L Rudolf L17 Feb 2025 12:03 UTC

253 points

42 comments75 min readLW link

(nosetgauge.substack.com)

Thermodynamic entropy = Kolmogorov complexity

Aram Ebtekar17 Feb 2025 5:56 UTC

77 points

14 comments1 min readLW link

(doi.org)

THE ARCHIVE

Jason Reid17 Feb 2025 1:12 UTC

7 points

0 comments6 min readLW link

[Question] What are the surviving worlds like?

KvmanThinking17 Feb 2025 0:41 UTC

21 points

2 comments1 min readLW link

CyberEconomy. The Limits to Growth

Timur Sadekov and Aleksei Vostriakov

16 Feb 2025 21:02 UTC

−3 points

0 comments23 min readLW link

Cooperation for AI safety must transcend geopolitical interference

Matrice Jacobine16 Feb 2025 18:18 UTC

7 points

6 comments1 min readLW link

(www.scmp.com)

[Question] Programming Language Early Funding?

J Thomas Moros16 Feb 2025 17:34 UTC

2 points

6 comments3 min readLW link

[Closed] Gauging Interest for a Learning-Theoretic Agenda Mentorship Programme

Vanessa Kosoy16 Feb 2025 16:24 UTC

54 points

5 comments2 min readLW link

Celtic Knots on Einstein Lattice

Ben16 Feb 2025 15:56 UTC

47 points

11 comments2 min readLW link

It’s been ten years. I propose HPMOR Anniversary Parties.

Screwtape16 Feb 2025 1:43 UTC

154 points

3 comments1 min readLW link

Come join Dovetail’s agent foundations fellowship talks & discussion

Alex_Altair15 Feb 2025 22:10 UTC

25 points

0 comments1 min readLW link

Quantifying the Qualitative: Towards a Bayesian Approach to Personal Insight

Pruthvi Kumar15 Feb 2025 19:50 UTC

1 point

0 comments6 min readLW link

Knitting a Sweater in a Burning House

CrimsonChin15 Feb 2025 19:50 UTC

27 points

2 comments2 min readLW link

Microplastics: Much Less Than You Wanted To Know

jenn, kaleb and Brent

15 Feb 2025 19:08 UTC

94 points

11 comments13 min readLW link

Preference for uncertainty and impact overestimation bias in altruistic systems.

Luck15 Feb 2025 12:27 UTC

1 point

0 comments1 min readLW link

Artificial Static Place Intelligence: Guaranteed Alignment

ank15 Feb 2025 11:08 UTC

2 points

2 comments2 min readLW link

The current AI strategic landscape: one bear’s perspective

Matrice Jacobine15 Feb 2025 9:49 UTC

11 points

0 comments2 min readLW link

(philosophybear.substack.com)

6 (Potential) Misconceptions about AI Intellectuals

ozziegooen14 Feb 2025 23:51 UTC

18 points

11 comments12 min readLW link

[Question] Should Open Philanthropy Make an Offer to Buy OpenAI?

peterr14 Feb 2025 23:18 UTC

25 points

1 comment1 min readLW link

A computational no-coincidence principle

Eric Neyman14 Feb 2025 21:39 UTC

152 points

40 comments6 min readLW link

(www.alignment.org)

Hopeful hypothesis, the Persona Jukebox.

Donald Hobson14 Feb 2025 19:24 UTC

11 points

4 comments3 min readLW link

Introduction to Expected Value Fanaticism

Petra Kosonen14 Feb 2025 19:05 UTC

9 points

8 comments1 min readLW link

(utilitarianism.net)

Intrinsic Dimension of Prompts in LLMs

Karthik Viswanathan14 Feb 2025 19:02 UTC

3 points

0 comments4 min readLW link

Objective Realism: A Perspective Beyond Human Constructs

Apatheos14 Feb 2025 19:02 UTC

−12 points

1 comment2 min readLW link

A short course on AGI safety from the GDM Alignment team

Vika and Rohin Shah

14 Feb 2025 15:43 UTC

105 points

2 comments1 min readLW link

(deepmindsafetyresearch.medium.com)

The Mask Comes Off: A Trio of Tales

Zvi14 Feb 2025 15:30 UTC

81 points

1 comment13 min readLW link

(thezvi.wordpress.com)

Celtic Knots on a hex lattice

Ben14 Feb 2025 14:29 UTC

27 points

10 comments2 min readLW link

Bimodal AI Beliefs

Adam Train14 Feb 2025 6:45 UTC

6 points

1 comment4 min readLW link

Systematic Sandbagging Evaluations on Claude 3.5 Sonnet

farrelmahaztra14 Feb 2025 1:22 UTC

13 points

0 comments1 min readLW link

(farrelmahaztra.com)

Paranoia, Cognitive Biases, and Catastrophic Thought Patterns.

Spiritus Dei14 Feb 2025 0:13 UTC

−4 points

1 comment6 min readLW link

Notes on the Presidential Election of 1836

Arjun Panickssery13 Feb 2025 23:40 UTC

23 points

0 comments7 min readLW link

(arjunpanickssery.substack.com)

Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia

ank13 Feb 2025 22:35 UTC

1 point

2 comments11 min readLW link

I’m making a ttrpg about life in an intentional community during the last year before the Singularity

bgaesop13 Feb 2025 21:54 UTC

11 points

2 comments2 min readLW link

SWE Automation Is Coming: Consider Selling Your Crypto

A_donor13 Feb 2025 20:17 UTC

12 points

8 comments1 min readLW link

≤10-year Timelines Remain Unlikely Despite DeepSeek and o3

Rafael Harth13 Feb 2025 19:21 UTC

55 points

67 comments15 min readLW link

System 2 Alignment: Deliberation, Review, and Thought Management

Seth Herd13 Feb 2025 19:17 UTC

40 points

0 comments19 min readLW link

Murder plots are infohazards

Chris Monteiro13 Feb 2025 19:15 UTC

305 points

46 comments2 min readLW link

Sparse Autoencoder Feature Ablation for Unlearning

aludert13 Feb 2025 19:13 UTC

3 points

0 comments11 min readLW link

What is it to solve the alignment problem?

Joe Carlsmith13 Feb 2025 18:42 UTC

31 points

6 comments19 min readLW link

(joecarlsmith.substack.com)