RSS

Ir­ra­tional­ity as a Defense Mechanism for Re­ward-hacking

Ashe Vazquez Nunez18 Jan 2026 3:57 UTC
2 points
0 comments4 min readLW link

Blog­ging, Writ­ing, Mus­ing, And Thinking

sonicrocketman18 Jan 2026 3:28 UTC
2 points
0 comments4 min readLW link
(brianschrader.com)

Is METR Un­der­es­ti­mat­ing LLM Time Hori­zons?

andreasrobinson18 Jan 2026 1:19 UTC
4 points
1 comment17 min readLW link

Fo­cus­ing on Flour­ish­ing Even When Sur­vival is Un­likely (I)

Cleo Nardo17 Jan 2026 18:47 UTC
9 points
3 comments4 min readLW link

The truth be­hind the 2026 J.P. Mor­gan Health­care Conference

Abhishaike Mahajan17 Jan 2026 17:28 UTC
67 points
1 comment9 min readLW link
(www.owlposting.com)

Ja­pan is a bank

bhauth17 Jan 2026 16:33 UTC
16 points
2 comments1 min readLW link
(www.bhauth.com)

Forfeit­ing Ill-Got­ten Gains

jefftk17 Jan 2026 0:20 UTC
25 points
1 comment1 min readLW link
(www.jefftk.com)

Is It Rea­son­ing or Just a Fixed Bias?

Sriram Kiron16 Jan 2026 21:43 UTC
14 points
0 comments1 min readLW link
(ramaway.com)

Fu­ture-as-La­bel: Scal­able Su­per­vi­sion from Real-World Outcomes

Ben Turtel16 Jan 2026 21:21 UTC
−1 points
0 comments1 min readLW link

Com­par­ing your­self to other people

dominicq16 Jan 2026 20:31 UTC
10 points
2 comments2 min readLW link
(sundaystopwatch.eu)

Prece­dents for the Un­prece­dented: His­tor­i­cal Analo­gies for Thir­teen Ar­tifi­cial Su­per­in­tel­li­gence Risks

James_Miller16 Jan 2026 18:43 UTC
96 points
3 comments63 min readLW link

Why fal­ling la­bor share ≠ fal­ling employment

Lydia Nottingham16 Jan 2026 17:27 UTC
−5 points
3 comments2 min readLW link
(lydianottingham.substack.com)

Digi­tal Minds: A Quick­start Guide

16 Jan 2026 17:15 UTC
10 points
1 comment22 min readLW link
(aviparrack.substack.com)

The cul­ture and de­sign of hu­man-AI interactions

zef16 Jan 2026 17:11 UTC
2 points
0 comments4 min readLW link
(bloodsteel.substack.com)

Scal­ing Laws for Eco­nomic Im­pacts: Ex­per­i­men­tal Ev­i­dence from 500 Pro­fes­sion­als and 13 LLMs

Ali Merali16 Jan 2026 13:40 UTC
19 points
4 comments4 min readLW link

[Pre-print] Build­ing safe AGI as an er­gonomics problem

16 Jan 2026 13:18 UTC
1 point
0 comments1 min readLW link
(doi.org)

Pow­er­ful mis­al­igned AIs may be ex­tremely per­sua­sive, es­pe­cially ab­sent mitigations

Cody Rushing16 Jan 2026 8:08 UTC
55 points
2 comments14 min readLW link

Should con­trol down-weight nega­tive net-sab­o­tage-value threats?

Fabien Roger16 Jan 2026 4:18 UTC
33 points
0 comments10 min readLW link

To­tal util­i­tar­i­anism is fine

Abhimanyu Pallavi Sudhir16 Jan 2026 0:32 UTC
3 points
2 comments3 min readLW link

Test your in­ter­pretabil­ity tech­niques by de-cen­sor­ing Chi­nese models

15 Jan 2026 16:33 UTC
69 points
5 comments20 min readLW link