Swiss fi­nan­cial reg­u­la­tor re­signs af­ter blog post from MITx DEDP on­line learner (FINMA, JuristGate, Par­reaux, Thiébaud & Part­ners)

pocock30 Jan 2026 23:53 UTC
2 points
0 comments1 min readLW link

Fore­cast: Re­cur­sively Self-im­prov­ing AI for 2033

CuoreDiVetro30 Jan 2026 23:53 UTC
0 points
0 comments3 min readLW link

Se­nior Re­searcher—MIT AI Risk Initiative

peterslattery30 Jan 2026 23:06 UTC
8 points
0 comments5 min readLW link

36,000 AI Agents Are Now Speedrun­ning Civilization

Michaël Trazzi30 Jan 2026 21:21 UTC
86 points
27 comments1 min readLW link

Molt­book Data Repository

30 Jan 2026 21:18 UTC
25 points
11 comments1 min readLW link

The Match­less Match

Linch30 Jan 2026 21:18 UTC
11 points
3 comments11 min readLW link

Mon­i­tor­ing bench­mark for AI control

30 Jan 2026 21:13 UTC
51 points
10 comments19 min readLW link

Back­ground to Claude’s un­cer­tainty about phe­nom­e­nal consciousness

eggsyntax30 Jan 2026 20:40 UTC
19 points
0 comments3 min readLW link

At­tempt­ing base model in­fer­ence scal­ing with filler tokens

Niki Dupuis30 Jan 2026 20:25 UTC
10 points
1 comment3 min readLW link

how whales click

bhauth30 Jan 2026 19:51 UTC
42 points
1 comment3 min readLW link

Austin LessWrong Cafe Meetup: Ap­plied Ra­tion­al­ity Techniques

SilasBarta30 Jan 2026 18:51 UTC
8 points
0 comments1 min readLW link

Pub­lished Safety Prompts May Create Eval­u­a­tion Blind Spots

30 Jan 2026 18:27 UTC
2 points
0 comments4 min readLW link

Ad­dress­ing Ob­jec­tions to the In­tel­li­gence Explosion

Bentham's Bulldog30 Jan 2026 18:21 UTC
23 points
0 comments16 min readLW link

Is re­search into re­cur­sive self-im­prove­ment be­com­ing a safety haz­ard?

Mordechai Rorvig30 Jan 2026 17:58 UTC
5 points
0 comments2 min readLW link
(www.foommagazine.org)

Tran­shu­man­ist Grief

MarkelKori30 Jan 2026 16:21 UTC
18 points
2 comments3 min readLW link

Mea­sur­ing Non-Ver­bal­ised Eval Aware­ness by Im­plant­ing Eval-Aware Behaviours

Jordan Taylor30 Jan 2026 15:50 UTC
31 points
0 comments8 min readLW link

Every­thing is Gambling

goldfine30 Jan 2026 14:10 UTC
−13 points
11 comments2 min readLW link
(itsnotgambling.substack.com)

Bordeaux (Gironde, France) ACX midterm Meetup Win­ter 2025–2026

vi21maobk9vp30 Jan 2026 13:01 UTC
5 points
0 comments1 min readLW link

On The Ado­les­cence of Technology

Zvi30 Jan 2026 12:50 UTC
38 points
8 comments30 min readLW link
(thezvi.wordpress.com)

Lin­ear steer­abil­ity in con­tin­u­ous chain-of-thought reasoning

Jan Bauer30 Jan 2026 10:34 UTC
10 points
0 comments14 min readLW link

Re­fusals that could be­come catastrophic

Fabien Roger30 Jan 2026 4:12 UTC
84 points
12 comments7 min readLW link

Rol­ling Com­mer­cial Jetliners

jefftk30 Jan 2026 3:30 UTC
22 points
5 comments1 min readLW link
(www.jefftk.com)

How to Hire a Team

Gretta Duleba29 Jan 2026 22:39 UTC
206 points
13 comments5 min readLW link

Prob­lems with “The Possessed Machines”

Eye You29 Jan 2026 21:00 UTC
34 points
9 comments7 min readLW link

Bet­ter evals are not enough to com­bat eval awareness

Igor Ivanov29 Jan 2026 20:42 UTC
18 points
15 comments5 min readLW link

The Wolves Are All Gone

Jack Bradshaw29 Jan 2026 20:24 UTC
8 points
0 comments7 min readLW link

Fit­ness-Seek­ers: Gen­er­al­iz­ing the Re­ward-Seek­ing Threat Model

Alex Mallen29 Jan 2026 19:42 UTC
92 points
5 comments17 min readLW link

Build­ing AIs that do hu­man-like philosophy

Joe Carlsmith29 Jan 2026 17:57 UTC
31 points
5 comments21 min readLW link

Are We in a Con­tinual Learn­ing Over­hang?

Samuel Knoche29 Jan 2026 17:09 UTC
83 points
5 comments14 min readLW link

Disem­pow­er­ment pat­terns in real-world AI usage

29 Jan 2026 16:36 UTC
49 points
3 comments2 min readLW link
(www.anthropic.com)

Ben­tham’s Bul­l­dog is wrong about AI risk

Max Harms29 Jan 2026 16:33 UTC
109 points
37 comments33 min readLW link

Claude Plays Poke­mon: Opus 4.5 Fol­low-up

Josh Snider29 Jan 2026 16:14 UTC
12 points
4 comments2 min readLW link

LLM Align­ment, eth­i­cal and math­e­mat­i­cal re­al­ism, and the most im­por­tant ac­tions in davi­dad’s understanding

29 Jan 2026 15:48 UTC
15 points
1 comment23 min readLW link

Claude Opus will spon­ta­neously iden­tify with fic­tional be­ings that have en­g­ineered desires

Kaj_Sotala29 Jan 2026 14:59 UTC
34 points
6 comments11 min readLW link

AI #153: Liv­ing Documents

Zvi29 Jan 2026 14:20 UTC
31 points
5 comments43 min readLW link
(thezvi.wordpress.com)

The third op­tion in alignment

arisAlexis29 Jan 2026 14:20 UTC
15 points
3 comments1 min readLW link

Ev­i­dence of triple layer pro­cess­ing in LLMs: hid­den thought be­hind the chain of thought.

Laureana Bonaparte29 Jan 2026 8:27 UTC
7 points
0 comments2 min readLW link

CAMBRIA’s 1st Edi­tion: High-In­ten­sity & hands-on AI Safety up­skil­ling in Cam­bridge, Mas­sachusetts.

Andrés Cotton29 Jan 2026 7:54 UTC
19 points
1 comment2 min readLW link

Thoughts on AGI and world government

29 Jan 2026 7:22 UTC
2 points
1 comment7 min readLW link
(www.forethought.org)

Un­prece­dented Times Re­quire Un­prece­dented Cau­tion When Han­dling Context

StanislavKrym29 Jan 2026 2:53 UTC
4 points
2 comments20 min readLW link
(hazardoustimes.substack.com)

Utrecht Meet & Greet

aad29 Jan 2026 0:56 UTC
10 points
2 comments1 min readLW link

How Ar­tic­u­late Are the Whales?

rba28 Jan 2026 21:24 UTC
73 points
26 comments6 min readLW link
(goflaw.substack.com)

The Her­i­tage Foun­da­tion’s Every­thing Bagel

Alexander Turok28 Jan 2026 20:14 UTC
6 points
0 comments10 min readLW link

You Are Here: His­tor­i­cal Con­text for Un­prece­dented Times

Hazard28 Jan 2026 20:13 UTC
13 points
1 comment1 min readLW link
(open.substack.com)

Uncer­tain Up­dates: Jan­uary 2026

Gordon Seidoh Worley28 Jan 2026 18:10 UTC
13 points
0 comments1 min readLW link
(www.uncertainupdates.com)

Made a game that tries to in­cen­tivize qual­ity think­ing & writ­ing, look­ing for feedback

sleno28 Jan 2026 18:02 UTC
7 points
0 comments1 min readLW link
(argyu.fun)

Is the Gell-Mann effect over­rated?

tgb28 Jan 2026 15:58 UTC
16 points
12 comments4 min readLW link

My sim­ple ar­gu­ment for AI policy action

TFD28 Jan 2026 15:07 UTC
3 points
0 comments6 min readLW link
(www.thefloatingdroid.com)

Open Prob­lems With Claude’s Constitution

Zvi28 Jan 2026 14:20 UTC
75 points
1 comment24 min readLW link
(thezvi.wordpress.com)

The State of Brain Emu­la­tion Re­port 2025 launched.

mschons28 Jan 2026 11:02 UTC
14 points
0 comments4 min readLW link