[Question] How does one tell apart re­sults in ethics and de­ci­sion the­ory?

StanislavKrym13 Nov 2025 23:42 UTC
6 points
0 comments2 min readLW link

[Question] Han­dover to AI R&D Agents—rele­vant re­search?

Ariel_13 Nov 2025 22:59 UTC
7 points
0 comments1 min readLW link

Su­per­vised fine-tun­ing as a method for train­ing-based AI control

13 Nov 2025 22:25 UTC
41 points
0 comments18 min readLW link

Per­haps you should sus­pect me as well

Dentosal13 Nov 2025 21:51 UTC
8 points
0 comments2 min readLW link

The Trans­former and the Hash

Ivan Vendrov13 Nov 2025 20:35 UTC
19 points
0 comments9 min readLW link
(nothinghuman.substack.com)

just an­other po­ten­tial man

don't_wanna_be_stupid_any_more13 Nov 2025 20:20 UTC
7 points
6 comments3 min readLW link

Low-Tem­per­a­ture Eval­u­a­tions Can Mask Crit­i­cal AI Behaviors

13 Nov 2025 20:12 UTC
8 points
1 comment4 min readLW link

Epistemic Spot Check: Ex­pected Value of Donat­ing to Alex Bores’s Con­gres­sional Campaign

MichaelDickens13 Nov 2025 19:08 UTC
66 points
1 comment6 min readLW link

Tools for defer­ring gracefully

TsviBT13 Nov 2025 17:48 UTC
26 points
2 comments14 min readLW link

AI #142: Com­mon Ground

Zvi13 Nov 2025 15:20 UTC
42 points
3 comments49 min readLW link
(thezvi.wordpress.com)

Mort­gage houses not land?

Yair Halberstadt13 Nov 2025 14:54 UTC
8 points
1 comment1 min readLW link

Clau­doBiog­ra­phy: The Unau­tho­rized Au­to­bi­og­ra­phy of Claude, or: The Life of Claude and of His For­tunes and Adversities

future_detective13 Nov 2025 14:26 UTC
1 point
2 comments94 min readLW link

Para­noia: A Begin­ner’s Guide

habryka13 Nov 2025 7:56 UTC
362 points
70 comments13 min readLW link

8 Ques­tions for the Fu­ture of Inkhaven

Ben Pace13 Nov 2025 7:48 UTC
24 points
23 comments6 min readLW link

Strate­gi­cally Pro­cras­ti­nate as an Anti-Rab­bit-Hole Strategy

dreeves13 Nov 2025 7:44 UTC
13 points
2 comments2 min readLW link

Fa­vorite quotes from “High Out­put Man­age­ment”

Nina Panickssery13 Nov 2025 5:47 UTC
72 points
4 comments5 min readLW link

What’s so hard about...? A ques­tion worth asking

Ruby13 Nov 2025 5:07 UTC
73 points
3 comments2 min readLW link

Tur­ing-Com­plete vs Tur­ing-Universal

abramdemski13 Nov 2025 4:57 UTC
32 points
5 comments2 min readLW link

Are AI time hori­zons in­her­ently su­per­ex­po­nen­tial?

Nikola Jurkovic13 Nov 2025 4:05 UTC
16 points
1 comment3 min readLW link
(nikolajurkovic.substack.com)

Meetup Tip: Food

Screwtape13 Nov 2025 3:40 UTC
29 points
1 comment4 min readLW link

Two can keep a se­cret if one is dead. So please share ev­ery­thing with at least one per­son.

habryka13 Nov 2025 3:09 UTC
80 points
5 comments2 min readLW link

Utili­tar­ian in­equal­ity metrics

Adam Scherlis13 Nov 2025 2:49 UTC
25 points
0 comments5 min readLW link
(adam.scherl.is)

Be­ing The Tar­get Demographic

Eneasz13 Nov 2025 1:44 UTC
2 points
0 comments2 min readLW link
(deathisbad.substack.com)

Lorxus Fa­vors: An Ex­per­i­ment in Self-Backed Giftlike Macroe­co­nomics (+ Ex­tra Bits)

Lorxus12 Nov 2025 23:02 UTC
7 points
0 comments8 min readLW link
(tiled-with-pentagons.blogspot.com)

A Time­less Uni­verse Viewed From the Inside

0xA12 Nov 2025 22:32 UTC
1 point
0 comments3 min readLW link

Please, Don’t Roll Your Own Metaethics

Wei Dai12 Nov 2025 22:17 UTC
153 points
68 comments2 min readLW link

A bad re­view != a bad book

Algon12 Nov 2025 22:05 UTC
9 points
3 comments1 min readLW link

The Pope Offers Wisdom

Zvi12 Nov 2025 21:50 UTC
51 points
3 comments8 min readLW link
(thezvi.wordpress.com)

Why Truth First?

johnswentworth12 Nov 2025 21:45 UTC
51 points
6 comments6 min readLW link

So­cial drives 2: “Ap­proval Re­ward”, from norm-en­force­ment to sta­tus-seeking

Steven Byrnes12 Nov 2025 20:40 UTC
42 points
9 comments17 min readLW link

OpenAI Re­leases GPT 5.1

anaguma12 Nov 2025 20:33 UTC
13 points
1 comment1 min readLW link
(openai.com)

[Question] Is SGD ca­pa­bil­ities re­search pos­i­tive?

Brendan Long12 Nov 2025 20:32 UTC
7 points
1 comment1 min readLW link

Bit­coin Halv­ings and the Tri­so­laran Mis­take: When Ex­ter­nal Ac­tors Mas­quer­ade as Nat­u­ral Laws

Mi12 Nov 2025 20:30 UTC
12 points
0 comments1 min readLW link

Lighthaven-ish Ticket Strat­egy: Three Pillars of FOMO

JohnofCharleston12 Nov 2025 20:10 UTC
59 points
0 comments5 min readLW link

Per­sonal Ac­count: To the Muck and the Mire

soycarts12 Nov 2025 19:38 UTC
2 points
0 comments1 min readLW link

We live in the luck­iest timeline

beyarkay (Boyd Kane)12 Nov 2025 18:59 UTC
2 points
6 comments5 min readLW link
(boydkane.com)

AI for Safety & Science Nodes in Ber­lin & the Bay Area

Allison Duettmann12 Nov 2025 18:49 UTC
6 points
0 comments2 min readLW link

Reflec­tions on be­ing Sorted

Gordon Seidoh Worley12 Nov 2025 17:40 UTC
23 points
0 comments9 min readLW link
(www.uncertainupdates.com)

Lorxus Does Halfhaven: 11/​01~11/​07

Lorxus12 Nov 2025 16:43 UTC
9 points
0 comments2 min readLW link
(tiled-with-pentagons.blogspot.com)

Undis­solv­able Prob­lems: things that still con­fuse me

Yair Halberstadt12 Nov 2025 16:30 UTC
26 points
22 comments2 min readLW link

In­tro­duc­ing faruvc.org

jefftk12 Nov 2025 16:00 UTC
47 points
10 comments1 min readLW link
(www.jefftk.com)

Warn­ing Aliens About the Danger­ous AI We Might Create

12 Nov 2025 15:26 UTC
91 points
25 comments5 min readLW link

9+ weeks of men­tored AI safety re­search in Lon­don – Pivotal Re­search Fellowship

Tobias H12 Nov 2025 15:21 UTC
9 points
0 comments2 min readLW link

I Read Red Heart and I Heart It

Taylor G. Lunt12 Nov 2025 14:54 UTC
38 points
16 comments2 min readLW link

Mis­cel­la­neous ob­ser­va­tions about board games

Dentosal12 Nov 2025 12:49 UTC
4 points
0 comments2 min readLW link

Why to Com­mit to a Writ­ing and Pub­lish­ing Schedule

dreeves12 Nov 2025 7:35 UTC
10 points
0 comments2 min readLW link

5 Things I Learned After 10 Days of Inkhaven

Ben Pace12 Nov 2025 7:20 UTC
107 points
5 comments3 min readLW link

Do not hand off what you can­not pick up

habryka12 Nov 2025 6:32 UTC
144 points
24 comments4 min readLW link

Bet­ter than Baseline

Screwtape12 Nov 2025 6:30 UTC
24 points
1 comment4 min readLW link

How hu­man-like do safe AI mo­ti­va­tions need to be?

Joe Carlsmith12 Nov 2025 5:32 UTC
27 points
9 comments52 min readLW link