Skill: cog­ni­tive black box flight recorder

TsviBT24 Jan 2026 22:54 UTC
27 points
2 comments5 min readLW link

In Defense of Memorization

David Goodman24 Jan 2026 22:49 UTC
24 points
7 comments13 min readLW link

Think­ing from the Other Side: Should I Wash My Hair with Sham­poo?

R0sberg24 Jan 2026 22:47 UTC
6 points
1 comment2 min readLW link

Small lan­guage mod­els hal­lu­ci­nate know­ing some­thing’s off.

Toheed24 Jan 2026 22:46 UTC
12 points
0 comments5 min readLW link

IABIED Book Re­view: Core Ar­gu­ments and Counterarguments

Stephen McAleese24 Jan 2026 14:25 UTC
90 points
39 comments25 min readLW link

The Global AI Dataset (GAID) Pro­ject: From Clos­ing Re­search Gaps to Build­ing Re­spon­si­ble and Trust­wor­thy AI

Jason Hung24 Jan 2026 3:23 UTC
7 points
0 comments15 min readLW link

A Black Box Made Less Opaque (part 1)

Matthew McDonnell24 Jan 2026 3:20 UTC
6 points
0 comments12 min readLW link

A Sim­ple Method for Ac­cel­er­at­ing Grokking

josh :)24 Jan 2026 3:19 UTC
14 points
1 comment3 min readLW link

Who is choos­ing your prefer­ences- You or your Mind?

shanzson24 Jan 2026 3:17 UTC
0 points
4 comments1 min readLW link

How I Used Method­able to Have a Nice Tuesday

dnsosebee24 Jan 2026 2:57 UTC
4 points
0 comments10 min readLW link

AI X-Risk Bot­tle­neck = Ad­vo­cacy?

fortytwo24 Jan 2026 2:52 UTC
10 points
0 comments1 min readLW link

Every Bench­mark is Broken

Jonathan Gabor24 Jan 2026 2:42 UTC
95 points
0 comments4 min readLW link
(jonathanpgabor.substack.com)

Thou­sand Year Old Ad­vice on Relin­quish­ing Con­trol to AI

Dom Polsinelli24 Jan 2026 2:20 UTC
−3 points
2 comments3 min readLW link
(dompols.substack.com)

AI Must Learn to Po­lice Itself

savant23 Jan 2026 22:39 UTC
1 point
0 comments2 min readLW link

Con­den­sa­tion & Relevance

abramdemski23 Jan 2026 22:21 UTC
38 points
0 comments5 min readLW link

Pay­ing at­ten­tion to At­ten­tion Sinks

Mitali M23 Jan 2026 21:40 UTC
11 points
5 comments1 min readLW link

Dat­ing Roundup #11: Go­ing Too Meta

Zvi23 Jan 2026 20:50 UTC
40 points
4 comments14 min readLW link
(thezvi.wordpress.com)

The Ar­tifi­cial Man

Jack Bradshaw23 Jan 2026 19:55 UTC
1 point
0 comments2 min readLW link

The Long View Of History

sonicrocketman23 Jan 2026 19:30 UTC
10 points
2 comments2 min readLW link
(brianschrader.com)

Emer­gency Re­sponse Mea­sures for Catas­trophic AI Risk

MKodama23 Jan 2026 18:18 UTC
27 points
2 comments3 min readLW link

Elic­it­ing base mod­els with sim­ple un­su­per­vised techniques

23 Jan 2026 18:06 UTC
34 points
2 comments8 min readLW link

New ver­sion of “In­tro to Brain-Like-AGI Safety”

Steven Byrnes23 Jan 2026 16:21 UTC
63 points
1 comment19 min readLW link

Au­to­mated Align­ment Re­search, Abductively

future_detective23 Jan 2026 16:14 UTC
2 points
0 comments2 min readLW link

Digi­tal Con­scious­ness Model Re­sults and Key Takeaways

23 Jan 2026 15:58 UTC
15 points
0 comments6 min readLW link

From Neu­rons to New­tons: What can the brain teach us about physics?

Carly Turini23 Jan 2026 15:20 UTC
1 point
0 comments1 min readLW link

A Frame­work for Eval Awareness

LAThomson23 Jan 2026 10:16 UTC
38 points
5 comments8 min readLW link

All Of The Good Things, None Of The Bad Things

omegastick23 Jan 2026 9:50 UTC
8 points
1 comment1 min readLW link
(dumbideas.xyz)

Are Short AI Timelines Really Higher-Lev­er­age?

23 Jan 2026 7:28 UTC
25 points
1 comment15 min readLW link
(www.forethought.org)

Prin­ci­ples for Meta-Science and AI Safety Replications

Zephaniah Roe23 Jan 2026 6:59 UTC
47 points
7 comments4 min readLW link

Value Learn­ing Needs a Low-Di­men­sional Bottleneck

Gunnar_Zarncke23 Jan 2026 2:12 UTC
24 points
7 comments1 min readLW link

A quick, el­e­gant deriva­tion of Bayes’ Theorem

RohanS23 Jan 2026 1:40 UTC
37 points
7 comments1 min readLW link

The World Hasn’t Gone Mad

goldfine23 Jan 2026 0:01 UTC
19 points
3 comments2 min readLW link
(itsnotgambling.substack.com)

Like night and day: Light glasses and dark ther­apy can treat non-24 (and SAD)

JennaS22 Jan 2026 23:23 UTC
30 points
1 comment9 min readLW link

Does Pen­tagon Pizza The­ory Work?

rba22 Jan 2026 19:24 UTC
140 points
11 comments5 min readLW link
(goflaw.substack.com)

The phases of an AI takeover

sjadler22 Jan 2026 19:09 UTC
12 points
1 comment9 min readLW link
(stevenadler.substack.com)

Will we get au­to­mated al­ign­ment re­search be­fore an AI Take­off?

Jan Wehner22 Jan 2026 17:46 UTC
33 points
2 comments11 min readLW link

[Question] How Could I Have Learned That Faster?

Dom Polsinelli22 Jan 2026 17:35 UTC
9 points
4 comments2 min readLW link

AI can sud­denly be­come dan­ger­ous de­spite grad­ual progress

Simon Lermen22 Jan 2026 16:47 UTC
15 points
0 comments4 min readLW link
(simonlermen.substack.com)

Re­leas­ing TakeOverBench.com: a bench­mark, for AI takeover

otto.barten22 Jan 2026 16:34 UTC
16 points
5 comments1 min readLW link

AI #152: Brought To You By The Tor­ment Nexus

Zvi22 Jan 2026 14:40 UTC
35 points
5 comments56 min readLW link
(thezvi.wordpress.com)

Re­sist­ing Reality

robertzk22 Jan 2026 13:50 UTC
26 points
3 comments6 min readLW link

Ex­per­i­ments on Re­ward Hack­ing Mon­i­tora­bil­ity in Lan­guage Models

Monketo22 Jan 2026 2:42 UTC
9 points
0 comments8 min readLW link

Neu­ral chameleons can(’t) hide from ac­ti­va­tion oracles

ceselder22 Jan 2026 1:47 UTC
55 points
5 comments3 min readLW link

Ded­i­cated con­tin­u­ous su­per­vi­sion of AI companies

Michael Bennett22 Jan 2026 1:47 UTC
8 points
0 comments15 min readLW link

Un­cov­er­ing Un­faith­ful CoT in De­cep­tive Models

Agastya Agrawal22 Jan 2026 1:46 UTC
12 points
2 comments3 min readLW link

Claude’s Con­sti­tu­tion is an ex­cel­lent guide for hu­mans, too

Eye You22 Jan 2026 1:26 UTC
27 points
0 comments5 min readLW link

The first type of trans­for­ma­tive AI?

Lizka21 Jan 2026 23:47 UTC
19 points
0 comments1 min readLW link
(www.forethought.org)

How (and why) to read Drexler on AI

owencb21 Jan 2026 23:25 UTC
55 points
12 comments6 min readLW link
(strangecities.substack.com)

Find­ing Your­self in Others

1a3orn21 Jan 2026 23:22 UTC
51 points
1 comment4 min readLW link

AI Risks Slip Out of Mind

MarkelKori21 Jan 2026 22:30 UTC
5 points
1 comment1 min readLW link