How to Hire a Team

Gretta Duleba29 Jan 2026 22:39 UTC
206 points
13 comments5 min readLW link

Prob­lems with “The Possessed Machines”

Eye You29 Jan 2026 21:00 UTC
34 points
9 comments7 min readLW link

Bet­ter evals are not enough to com­bat eval awareness

Igor Ivanov29 Jan 2026 20:42 UTC
18 points
15 comments5 min readLW link

The Wolves Are All Gone

Jack Bradshaw29 Jan 2026 20:24 UTC
8 points
0 comments7 min readLW link

Fit­ness-Seek­ers: Gen­er­al­iz­ing the Re­ward-Seek­ing Threat Model

Alex Mallen29 Jan 2026 19:42 UTC
92 points
5 comments17 min readLW link

Build­ing AIs that do hu­man-like philosophy

Joe Carlsmith29 Jan 2026 17:57 UTC
31 points
5 comments21 min readLW link

Are We in a Con­tinual Learn­ing Over­hang?

Samuel Knoche29 Jan 2026 17:09 UTC
83 points
5 comments14 min readLW link

Disem­pow­er­ment pat­terns in real-world AI usage

29 Jan 2026 16:36 UTC
49 points
3 comments2 min readLW link
(www.anthropic.com)

Ben­tham’s Bul­l­dog is wrong about AI risk

Max Harms29 Jan 2026 16:33 UTC
109 points
37 comments33 min readLW link

Claude Plays Poke­mon: Opus 4.5 Fol­low-up

Josh Snider29 Jan 2026 16:14 UTC
12 points
4 comments2 min readLW link

LLM Align­ment, eth­i­cal and math­e­mat­i­cal re­al­ism, and the most im­por­tant ac­tions in davi­dad’s understanding

29 Jan 2026 15:48 UTC
15 points
1 comment23 min readLW link

Claude Opus will spon­ta­neously iden­tify with fic­tional be­ings that have en­g­ineered desires

Kaj_Sotala29 Jan 2026 14:59 UTC
34 points
6 comments11 min readLW link

AI #153: Liv­ing Documents

Zvi29 Jan 2026 14:20 UTC
31 points
5 comments43 min readLW link
(thezvi.wordpress.com)

The third op­tion in alignment

arisAlexis29 Jan 2026 14:20 UTC
15 points
3 comments1 min readLW link

Ev­i­dence of triple layer pro­cess­ing in LLMs: hid­den thought be­hind the chain of thought.

Laureana Bonaparte29 Jan 2026 8:27 UTC
7 points
0 comments2 min readLW link

CAMBRIA’s 1st Edi­tion: High-In­ten­sity & hands-on AI Safety up­skil­ling in Cam­bridge, Mas­sachusetts.

Andrés Cotton29 Jan 2026 7:54 UTC
19 points
1 comment2 min readLW link

Thoughts on AGI and world government

29 Jan 2026 7:22 UTC
2 points
1 comment7 min readLW link
(www.forethought.org)

Un­prece­dented Times Re­quire Un­prece­dented Cau­tion When Han­dling Context

StanislavKrym29 Jan 2026 2:53 UTC
4 points
2 comments20 min readLW link
(hazardoustimes.substack.com)

Utrecht Meet & Greet

aad29 Jan 2026 0:56 UTC
10 points
2 comments1 min readLW link

How Ar­tic­u­late Are the Whales?

rba28 Jan 2026 21:24 UTC
73 points
26 comments6 min readLW link
(goflaw.substack.com)

The Her­i­tage Foun­da­tion’s Every­thing Bagel

Alexander Turok28 Jan 2026 20:14 UTC
6 points
0 comments10 min readLW link

You Are Here: His­tor­i­cal Con­text for Un­prece­dented Times

Hazard28 Jan 2026 20:13 UTC
13 points
1 comment1 min readLW link
(open.substack.com)

Uncer­tain Up­dates: Jan­uary 2026

Gordon Seidoh Worley28 Jan 2026 18:10 UTC
13 points
0 comments1 min readLW link
(www.uncertainupdates.com)

Made a game that tries to in­cen­tivize qual­ity think­ing & writ­ing, look­ing for feedback

sleno28 Jan 2026 18:02 UTC
7 points
0 comments1 min readLW link
(argyu.fun)

Is the Gell-Mann effect over­rated?

tgb28 Jan 2026 15:58 UTC
16 points
12 comments4 min readLW link

My sim­ple ar­gu­ment for AI policy action

TFD28 Jan 2026 15:07 UTC
3 points
0 comments6 min readLW link
(www.thefloatingdroid.com)

Open Prob­lems With Claude’s Constitution

Zvi28 Jan 2026 14:20 UTC
75 points
1 comment24 min readLW link
(thezvi.wordpress.com)

The State of Brain Emu­la­tion Re­port 2025 launched.

mschons28 Jan 2026 11:02 UTC
14 points
0 comments4 min readLW link

Con­tra Sam Har­ris on Free Will

Julius28 Jan 2026 7:17 UTC
20 points
7 comments36 min readLW link
(thegreymatter.substack.com)

The Ar­gu­ment for Autonomy

Character#273628 Jan 2026 5:10 UTC
−4 points
0 comments10 min readLW link

Gym-Like En­vi­ron­ment for LM Truth-Seeking

Tianyi (Alex) Qiu28 Jan 2026 4:48 UTC
7 points
0 comments1 min readLW link
(github.com)

Ano­ma­lous To­kens on Gem­ini 3.0 Pro

DirectedEvolution28 Jan 2026 1:43 UTC
55 points
7 comments9 min readLW link

Clar­ify­ing how our AI timelines fore­casts have changed since AI 2027

27 Jan 2026 22:58 UTC
69 points
12 comments6 min readLW link
(blog.ai-futures.org)

Bounty: De­tect­ing Steganog­ra­phy via On­tol­ogy Translation

Elliot Callender27 Jan 2026 22:01 UTC
12 points
1 comment4 min readLW link

Thoughts on Claude’s Constitution

Boaz Barak27 Jan 2026 20:51 UTC
62 points
13 comments8 min readLW link

AI found 12 of 12 OpenSSL zero-days (while curl can­cel­led its bug bounty)

Stanislav Fort27 Jan 2026 20:21 UTC
359 points
25 comments8 min readLW link

The Chaos Defense

25Hour27 Jan 2026 18:51 UTC
−1 points
3 comments1 min readLW link
(lifeimprovementschemes.substack.com)

Train­ing on Non-Poli­ti­cal but Trump-Style Text Causes LLMs to Be­come Authoritarian

Anders Cairns Woodruff27 Jan 2026 16:46 UTC
5 points
2 comments2 min readLW link

ML4Good Spring 2026 Boot­camps—Ap­pli­ca­tions Open!

Jack_S27 Jan 2026 16:18 UTC
5 points
0 comments1 min readLW link

Disagree­ment Comes From the Dark World

Zack_M_Davis27 Jan 2026 15:22 UTC
23 points
21 comments11 min readLW link
(zackmdavis.net)

The Claude Con­sti­tu­tion’s Eth­i­cal Framework

Zvi27 Jan 2026 15:00 UTC
58 points
1 comment18 min readLW link
(thezvi.wordpress.com)

My favourite ver­sion of an in­ter­na­tional AGI project

wdmacaskill27 Jan 2026 10:27 UTC
2 points
3 comments11 min readLW link
(www.forethought.org)

Another glimpse of the Chi­nese AI scene: Z.AI

Mitchell_Porter27 Jan 2026 8:00 UTC
34 points
2 comments2 min readLW link

Bologna Fe­bru­ary Meetup

Luca Petrolati27 Jan 2026 7:03 UTC
1 point
0 comments1 min readLW link

Things I learned from red­dit fashion

Elizabeth27 Jan 2026 4:10 UTC
47 points
0 comments5 min readLW link
(acesounderglass.com)

Ex­plo­ra­tory: a steer­ing vec­tor in Gemma-2-2B-IT boosts con­text fidelity on sub­trac­tion, goes manic on addition

nika koghuashvili27 Jan 2026 2:25 UTC
5 points
0 comments5 min readLW link

It All Started With a Mac Mini

Steven McCulloch27 Jan 2026 2:01 UTC
27 points
1 comment5 min readLW link

The Win­dow for Poli­ti­cal Revolu­tion is Clos­ing Soon

koanchuk27 Jan 2026 0:23 UTC
24 points
15 comments2 min readLW link

Thomas Schel­ling Ap­pre­ci­a­tion Day

Optimization Process27 Jan 2026 0:04 UTC
17 points
2 comments1 min readLW link

No silver bul­let: Les­sons about how to cre­ate safety from the his­tory of fire

jasoncrawford26 Jan 2026 22:18 UTC
28 points
1 comment7 min readLW link
(newsletter.rootsofprogress.org)