Hid­den Rea­son­ing in LLMs: A Taxonomy

25 Aug 2025 22:43 UTC
65 points
10 comments12 min readLW link

The NAO is Hiring for Part­ner­ships, Re­sponse, Virol­ogy, and Wet Lab Management

jefftk25 Aug 2025 22:37 UTC
16 points
0 comments2 min readLW link
(naobservatory.org)

ACX/​SSC Meetup

teegs25 Aug 2025 20:21 UTC
1 point
0 comments1 min readLW link

Proac­tive AI Con­trol: A Case for Bat­tery-Depen­dent Systems

Jesper L.25 Aug 2025 20:04 UTC
4 points
0 comments13 min readLW link

Solv­ing ir­ra­tional fear as de­cid­ing: A worked example

jimmy25 Aug 2025 19:44 UTC
24 points
4 comments7 min readLW link

Breast­feed­ing and IQ: Effects shrink as you con­trol for more confounders

Nina Panickssery25 Aug 2025 18:43 UTC
44 points
3 comments1 min readLW link
(blog.ninapanickssery.com)

Qual­ity Precision

Ben25 Aug 2025 17:58 UTC
24 points
13 comments3 min readLW link

Neu­ro­science of hu­man sex­ual at­trac­tion trig­gers (3 hy­pothe­ses)

Steven Byrnes25 Aug 2025 17:51 UTC
54 points
6 comments12 min readLW link

Be­fore LLM Psy­chosis, There Was Yes-Man Psychosis

johnswentworth25 Aug 2025 17:47 UTC
186 points
20 comments3 min readLW link

Steel­man­ning Con­scious AI De­fault Friendli­ness

J Bostock25 Aug 2025 16:56 UTC
8 points
0 comments2 min readLW link

Med­i­ta­tions on Margarine

The Dao of Bayes25 Aug 2025 16:35 UTC
13 points
0 comments8 min readLW link

LLMs: psych sub­jects & quan­ti­ta­tive BS

Weekend Editor25 Aug 2025 15:52 UTC
2 points
0 comments1 min readLW link

Great re­spon­si­bil­ity re­quires great power

dr_s25 Aug 2025 15:18 UTC
16 points
0 comments6 min readLW link

A Com­pre­hen­sive Guide to Running

Declan Molony25 Aug 2025 15:12 UTC
85 points
24 comments16 min readLW link

Ar­gu­ments About AI Con­scious­ness Seem Highly Mo­ti­vated And At Best Overconfident

Zvi25 Aug 2025 13:20 UTC
81 points
5 comments25 min readLW link
(thezvi.wordpress.com)

Overview of (Some) Biotech-Based Adult In­tel­li­gence Am­plifi­ca­tion Approaches

Ihor Kendiukhov25 Aug 2025 8:00 UTC
21 points
0 comments22 min readLW link

Diffu­sion Primer

Sneha Bangalore24 Aug 2025 23:35 UTC
3 points
0 comments7 min readLW link

The Best Re­sources To Build Any Intuition

Algon24 Aug 2025 19:12 UTC
66 points
9 comments4 min readLW link

Evolu­tion fa­vors the abil­ity to change sub­jec­tive prob­a­bil­ities in MWI + Ex­per­i­men­tal test

avturchin24 Aug 2025 14:58 UTC
−3 points
6 comments11 min readLW link

Le­gal Per­son­hood—In­tel­lec­tual Property

Stephen Martin24 Aug 2025 6:05 UTC
6 points
5 comments6 min readLW link

Notes on co­op­er­at­ing with un­al­igned AIs

Lukas Finnveden24 Aug 2025 4:19 UTC
53 points
8 comments21 min readLW link
(blog.redwoodresearch.org)

Kids and Cleaning

jefftk24 Aug 2025 3:30 UTC
39 points
0 comments3 min readLW link
(www.jefftk.com)

Shorter To­kens Are More Likely

Brendan Long24 Aug 2025 0:22 UTC
81 points
19 comments5 min readLW link
(www.brendanlong.com)

Anal­y­sis of Vari­a­tional Sparse Autoencoders

Zach Baker23 Aug 2025 23:58 UTC
11 points
0 comments10 min readLW link

Thoughts About how RLHF and Re­lated “Pro­saic” Ap­proaches Could be Used to Create Ro­bustly Aligned AIs.

williawa23 Aug 2025 21:05 UTC
10 points
14 comments4 min readLW link

On the Func­tion of Faith in A Prob­a­bly-Si­mu­lated Universe

testingthewaters23 Aug 2025 20:28 UTC
−8 points
12 comments7 min readLW link
(aclevername.substack.com)

The Data Scal­ing Hypothesis

harsimony23 Aug 2025 18:18 UTC
5 points
0 comments1 min readLW link

How a Non-Dual Lan­guage Could Redefine AI Safety

Marcio Díaz23 Aug 2025 16:40 UTC
1 point
6 comments3 min readLW link

The Great Game: Game The­ory for Col­lec­tive Intelligence

Rome Viharo23 Aug 2025 15:04 UTC
−2 points
0 comments2 min readLW link

The Startup Jungle

Logan Kieller23 Aug 2025 14:59 UTC
7 points
0 comments8 min readLW link
(agenticconjectures.substack.com)

The most com­mon mis­takes peo­ple make start­ing EA orgs

KatWoods23 Aug 2025 14:18 UTC
2 points
0 comments4 min readLW link

Fu­til­ity Illusions

silentbob23 Aug 2025 10:54 UTC
31 points
10 comments5 min readLW link

Le­gal Per­son­hood—Cor­po­rate Own­er­ship & Formation

Stephen Martin23 Aug 2025 5:45 UTC
4 points
0 comments3 min readLW link

AI 2027 Re­sponse Followup

SE Gyges23 Aug 2025 4:41 UTC
3 points
3 comments9 min readLW link
(www.lesswrong.com)

Pasta Cook­ing Time

jefftk23 Aug 2025 3:00 UTC
22 points
1 comment1 min readLW link
(www.jefftk.com)

Pet Ownership

incident-recipient23 Aug 2025 1:54 UTC
11 points
0 comments3 min readLW link

Reflec­tions on writ­ing 15 daily blog posts

CstineSublime23 Aug 2025 1:50 UTC
12 points
0 comments4 min readLW link

How Econ 101 makes us blin­der on trade, morals, jobs with AI – and on marginal costs

FlorianH23 Aug 2025 0:59 UTC
17 points
5 comments8 min readLW link
(nearlyfar.org)

Me­mory De­cod­ing Jour­nal Club: Be­hav­ioral time scale synap­tic plas­tic­ity un­der­lies CA1 place fields

Devin Ward23 Aug 2025 0:53 UTC
1 point
0 comments1 min readLW link

Yud­kowsky on “Don’t use p(doom)”

Raemon22 Aug 2025 23:44 UTC
98 points
39 comments4 min readLW link

Ban­ning Said Ach­miz (and broader thoughts on mod­er­a­tion)

habryka22 Aug 2025 23:02 UTC
244 points
395 comments30 min readLW link

(∃ Stochas­tic Nat­u­ral La­tent) Im­plies (∃ Deter­minis­tic Nat­u­ral La­tent)

22 Aug 2025 21:46 UTC
126 points
8 comments9 min readLW link

One more rea­son for AI ca­pa­ble of in­de­pen­dent moral rea­son­ing: al­ign­ment it­self and cause prioritisation

Michele Campolo22 Aug 2025 15:53 UTC
−3 points
0 comments3 min readLW link

The Bud­dhism & AI Initiative

Chris Scammell22 Aug 2025 15:50 UTC
29 points
2 comments2 min readLW link

Deep­Seek v3.1 Is Not Hav­ing a Moment

Zvi22 Aug 2025 15:50 UTC
40 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Do­ing good… best?

Michele Campolo22 Aug 2025 15:48 UTC
−1 points
6 comments2 min readLW link

With enough knowl­edge, any con­scious agent acts morally

Michele Campolo22 Aug 2025 15:44 UTC
−2 points
9 comments36 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple?

P. João22 Aug 2025 14:04 UTC
5 points
0 comments2 min readLW link

CEO of Microsoft AI’s “Seem­ingly Con­scious AI” Post

Stephen Martin22 Aug 2025 13:58 UTC
64 points
8 comments8 min readLW link

Could we have pre­dicted emer­gent mis­al­ign­ment a pri­ori us­ing un­su­per­vised be­havi­our elic­i­ta­tion?

Daniel Tan22 Aug 2025 13:42 UTC
6 points
0 comments1 min readLW link