AI Must Learn to Po­lice Itself

savant23 Jan 2026 22:39 UTC
1 point
0 comments2 min readLW link

Con­den­sa­tion & Relevance

abramdemski23 Jan 2026 22:21 UTC
38 points
0 comments5 min readLW link

Pay­ing at­ten­tion to At­ten­tion Sinks

Mitali M23 Jan 2026 21:40 UTC
11 points
5 comments1 min readLW link

Dat­ing Roundup #11: Go­ing Too Meta

Zvi23 Jan 2026 20:50 UTC
40 points
4 comments14 min readLW link
(thezvi.wordpress.com)

The Ar­tifi­cial Man

Jack Bradshaw23 Jan 2026 19:55 UTC
1 point
0 comments2 min readLW link

The Long View Of History

sonicrocketman23 Jan 2026 19:30 UTC
10 points
2 comments2 min readLW link
(brianschrader.com)

Emer­gency Re­sponse Mea­sures for Catas­trophic AI Risk

MKodama23 Jan 2026 18:18 UTC
27 points
2 comments3 min readLW link

Elic­it­ing base mod­els with sim­ple un­su­per­vised techniques

23 Jan 2026 18:06 UTC
34 points
2 comments8 min readLW link

New ver­sion of “In­tro to Brain-Like-AGI Safety”

Steven Byrnes23 Jan 2026 16:21 UTC
63 points
1 comment19 min readLW link

Au­to­mated Align­ment Re­search, Abductively

future_detective23 Jan 2026 16:14 UTC
2 points
0 comments2 min readLW link

Digi­tal Con­scious­ness Model Re­sults and Key Takeaways

23 Jan 2026 15:58 UTC
15 points
0 comments6 min readLW link

From Neu­rons to New­tons: What can the brain teach us about physics?

Carly Turini23 Jan 2026 15:20 UTC
1 point
0 comments1 min readLW link

A Frame­work for Eval Awareness

LAThomson23 Jan 2026 10:16 UTC
38 points
5 comments8 min readLW link

All Of The Good Things, None Of The Bad Things

omegastick23 Jan 2026 9:50 UTC
8 points
1 comment1 min readLW link
(dumbideas.xyz)

Are Short AI Timelines Really Higher-Lev­er­age?

23 Jan 2026 7:28 UTC
25 points
1 comment15 min readLW link
(www.forethought.org)

Prin­ci­ples for Meta-Science and AI Safety Replications

Zephaniah Roe23 Jan 2026 6:59 UTC
47 points
7 comments4 min readLW link

Value Learn­ing Needs a Low-Di­men­sional Bottleneck

Gunnar_Zarncke23 Jan 2026 2:12 UTC
24 points
7 comments1 min readLW link

A quick, el­e­gant deriva­tion of Bayes’ Theorem

RohanS23 Jan 2026 1:40 UTC
37 points
7 comments1 min readLW link

The World Hasn’t Gone Mad

goldfine23 Jan 2026 0:01 UTC
19 points
3 comments2 min readLW link
(itsnotgambling.substack.com)

Like night and day: Light glasses and dark ther­apy can treat non-24 (and SAD)

JennaS22 Jan 2026 23:23 UTC
30 points
1 comment9 min readLW link

Does Pen­tagon Pizza The­ory Work?

rba22 Jan 2026 19:24 UTC
140 points
11 comments5 min readLW link
(goflaw.substack.com)

The phases of an AI takeover

sjadler22 Jan 2026 19:09 UTC
12 points
1 comment9 min readLW link
(stevenadler.substack.com)

Will we get au­to­mated al­ign­ment re­search be­fore an AI Take­off?

Jan Wehner22 Jan 2026 17:46 UTC
33 points
2 comments11 min readLW link

[Question] How Could I Have Learned That Faster?

Dom Polsinelli22 Jan 2026 17:35 UTC
9 points
4 comments2 min readLW link

AI can sud­denly be­come dan­ger­ous de­spite grad­ual progress

Simon Lermen22 Jan 2026 16:47 UTC
15 points
0 comments4 min readLW link
(simonlermen.substack.com)

Re­leas­ing TakeOverBench.com: a bench­mark, for AI takeover

otto.barten22 Jan 2026 16:34 UTC
16 points
5 comments1 min readLW link

AI #152: Brought To You By The Tor­ment Nexus

Zvi22 Jan 2026 14:40 UTC
35 points
5 comments56 min readLW link
(thezvi.wordpress.com)

Re­sist­ing Reality

robertzk22 Jan 2026 13:50 UTC
26 points
3 comments6 min readLW link

Ex­per­i­ments on Re­ward Hack­ing Mon­i­tora­bil­ity in Lan­guage Models

Monketo22 Jan 2026 2:42 UTC
9 points
0 comments8 min readLW link

Neu­ral chameleons can(’t) hide from ac­ti­va­tion oracles

ceselder22 Jan 2026 1:47 UTC
55 points
5 comments3 min readLW link

Ded­i­cated con­tin­u­ous su­per­vi­sion of AI companies

Michael Bennett22 Jan 2026 1:47 UTC
8 points
0 comments15 min readLW link

Un­cov­er­ing Un­faith­ful CoT in De­cep­tive Models

Agastya Agrawal22 Jan 2026 1:46 UTC
12 points
2 comments3 min readLW link

Claude’s Con­sti­tu­tion is an ex­cel­lent guide for hu­mans, too

Eye You22 Jan 2026 1:26 UTC
27 points
0 comments5 min readLW link

The first type of trans­for­ma­tive AI?

Lizka21 Jan 2026 23:47 UTC
19 points
0 comments1 min readLW link
(www.forethought.org)

How (and why) to read Drexler on AI

owencb21 Jan 2026 23:25 UTC
55 points
12 comments6 min readLW link
(strangecities.substack.com)

Find­ing Your­self in Others

1a3orn21 Jan 2026 23:22 UTC
51 points
1 comment4 min readLW link

AI Risks Slip Out of Mind

MarkelKori21 Jan 2026 22:30 UTC
5 points
1 comment1 min readLW link

When should we train against a schem­ing mon­i­tor?

Mary Phuong21 Jan 2026 20:48 UTC
24 points
4 comments5 min readLW link

Claude Codes #3

Zvi21 Jan 2026 19:50 UTC
47 points
5 comments15 min readLW link
(thezvi.wordpress.com)

Claude’s new constitution

21 Jan 2026 19:37 UTC
176 points
47 comments6 min readLW link
(www.anthropic.com)

Crimes of the Fu­ture, Solu­tions of the Past

evrim21 Jan 2026 19:20 UTC
18 points
1 comment4 min readLW link

On vi­sions of a “good fu­ture” for hu­man­ity in a world with ar­tifi­cial superintelligence

Jakub Growiec21 Jan 2026 18:27 UTC
1 point
0 comments30 min readLW link

The case for AGI safety products

Marius Hobbhahn21 Jan 2026 17:23 UTC
68 points
7 comments12 min readLW link

Up­dat­ing in the Op­po­site Direc­tion from Evidence

Dom Polsinelli21 Jan 2026 16:08 UTC
1 point
0 comments3 min readLW link
(dompols.substack.com)

Vibing with Claude, Jan­uary 2026 Edition

Gordon Seidoh Worley21 Jan 2026 16:00 UTC
26 points
2 comments4 min readLW link
(www.uncertainupdates.com)

AI Needs Peo­ple (So, It Won’t Be Like Ter­mi­na­tor Movie)

Victor Porton21 Jan 2026 14:42 UTC
−23 points
0 comments2 min readLW link

Kredit Grant

kian21 Jan 2026 0:56 UTC
5 points
5 comments1 min readLW link

Money Can’t Buy the Smile on a Child’s Face As They Look at A Beau­tiful Sun­set… but it also can’t buy a malaria free world: my cur­rent un­der­stand­ing of how Effec­tive Altru­ism has failed

Hazard20 Jan 2026 23:28 UTC
70 points
17 comments6 min readLW link
(naturalhazard.xyz)

ACX At­lanta Fe­bru­ary Meetup

Steve French20 Jan 2026 22:30 UTC
2 points
0 comments1 min readLW link

So Long Sucker: AI De­cep­tion, “Alli­ance Banks,” and In­sti­tu­tional Lying

fernando yt20 Jan 2026 22:29 UTC
47 points
5 comments2 min readLW link