Anal­y­sis of Vari­a­tional Sparse Autoencoders

Zach Baker23 Aug 2025 23:58 UTC
11 points
0 comments10 min readLW link

Thoughts About how RLHF and Re­lated “Pro­saic” Ap­proaches Could be Used to Create Ro­bustly Aligned AIs.

williawa23 Aug 2025 21:05 UTC
10 points
14 comments4 min readLW link

On the Func­tion of Faith in A Prob­a­bly-Si­mu­lated Universe

testingthewaters23 Aug 2025 20:28 UTC
−8 points
12 comments7 min readLW link
(aclevername.substack.com)

The Data Scal­ing Hypothesis

harsimony23 Aug 2025 18:18 UTC
5 points
0 comments1 min readLW link

How a Non-Dual Lan­guage Could Redefine AI Safety

Marcio Díaz23 Aug 2025 16:40 UTC
1 point
6 comments3 min readLW link

The Great Game: Game The­ory for Col­lec­tive Intelligence

Rome Viharo23 Aug 2025 15:04 UTC
−2 points
0 comments2 min readLW link

The Startup Jungle

Logan Kieller23 Aug 2025 14:59 UTC
7 points
0 comments8 min readLW link
(agenticconjectures.substack.com)

The most com­mon mis­takes peo­ple make start­ing EA orgs

KatWoods23 Aug 2025 14:18 UTC
2 points
0 comments4 min readLW link

Fu­til­ity Illusions

silentbob23 Aug 2025 10:54 UTC
31 points
10 comments5 min readLW link

Le­gal Per­son­hood—Cor­po­rate Own­er­ship & Formation

Stephen Martin23 Aug 2025 5:45 UTC
4 points
0 comments3 min readLW link

AI 2027 Re­sponse Followup

SE Gyges23 Aug 2025 4:41 UTC
3 points
3 comments9 min readLW link
(www.lesswrong.com)

Pasta Cook­ing Time

jefftk23 Aug 2025 3:00 UTC
22 points
1 comment1 min readLW link
(www.jefftk.com)

Pet Ownership

incident-recipient23 Aug 2025 1:54 UTC
11 points
0 comments3 min readLW link

Reflec­tions on writ­ing 15 daily blog posts

CstineSublime23 Aug 2025 1:50 UTC
12 points
0 comments4 min readLW link

How Econ 101 makes us blin­der on trade, morals, jobs with AI – and on marginal costs

FlorianH23 Aug 2025 0:59 UTC
17 points
5 comments8 min readLW link
(nearlyfar.org)

Me­mory De­cod­ing Jour­nal Club: Be­hav­ioral time scale synap­tic plas­tic­ity un­der­lies CA1 place fields

Devin Ward23 Aug 2025 0:53 UTC
1 point
0 comments1 min readLW link

Yud­kowsky on “Don’t use p(doom)”

Raemon22 Aug 2025 23:44 UTC
98 points
39 comments4 min readLW link

Ban­ning Said Ach­miz (and broader thoughts on mod­er­a­tion)

habryka22 Aug 2025 23:02 UTC
244 points
395 comments30 min readLW link

(∃ Stochas­tic Nat­u­ral La­tent) Im­plies (∃ Deter­minis­tic Nat­u­ral La­tent)

22 Aug 2025 21:46 UTC
126 points
8 comments9 min readLW link

One more rea­son for AI ca­pa­ble of in­de­pen­dent moral rea­son­ing: al­ign­ment it­self and cause prioritisation

Michele Campolo22 Aug 2025 15:53 UTC
−3 points
0 comments3 min readLW link

The Bud­dhism & AI Initiative

Chris Scammell22 Aug 2025 15:50 UTC
29 points
2 comments2 min readLW link

Deep­Seek v3.1 Is Not Hav­ing a Moment

Zvi22 Aug 2025 15:50 UTC
40 points
2 comments3 min readLW link
(thezvi.wordpress.com)

Do­ing good… best?

Michele Campolo22 Aug 2025 15:48 UTC
−1 points
6 comments2 min readLW link

With enough knowl­edge, any con­scious agent acts morally

Michele Campolo22 Aug 2025 15:44 UTC
−2 points
9 comments36 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple?

P. João22 Aug 2025 14:04 UTC
5 points
0 comments2 min readLW link

CEO of Microsoft AI’s “Seem­ingly Con­scious AI” Post

Stephen Martin22 Aug 2025 13:58 UTC
64 points
8 comments8 min readLW link

Could we have pre­dicted emer­gent mis­al­ign­ment a pri­ori us­ing un­su­per­vised be­havi­our elic­i­ta­tion?

Daniel Tan22 Aug 2025 13:42 UTC
6 points
0 comments1 min readLW link

An In­tro­duc­tion to Credal Sets and In­fra-Bayes Learnability

Brittany Gelb22 Aug 2025 13:03 UTC
33 points
2 comments13 min readLW link

Le­gal Per­son­hood—Con­tracts (Part 2)

Stephen Martin22 Aug 2025 4:53 UTC
5 points
0 comments2 min readLW link

When Money Be­comes Power

Gabriel Alfour22 Aug 2025 4:14 UTC
61 points
16 comments7 min readLW link
(cognition.cafe)

Proof Sec­tion to an In­tro­duc­tion to Credal Sets and In­fra-Bayes Learnability

Brittany Gelb21 Aug 2025 23:11 UTC
13 points
0 comments10 min readLW link

Re­sam­pling Con­serves Re­dun­dancy (Ap­prox­i­mately)

21 Aug 2025 22:43 UTC
68 points
2 comments6 min readLW link

The anti-frag­ile culture

lincolnquirk21 Aug 2025 21:41 UTC
30 points
1 comment10 min readLW link

A Con­ser­va­tive Vi­sion For AI Alignment

21 Aug 2025 18:14 UTC
25 points
34 comments12 min readLW link

Emer­gent moral­ity in AI weak­ens the Orthog­o­nal­ity Thesis

dawnstrata21 Aug 2025 17:57 UTC
−1 points
3 comments11 min readLW link

Four ways learn­ing Econ makes peo­ple dumber re: fu­ture AI

Steven Byrnes21 Aug 2025 17:52 UTC
360 points
49 comments6 min readLW link
(x.com)

Me­mory De­cod­ing Jour­nal Club: Be­hav­ioral time scale synap­tic plas­tic­ity un­der­lies CA1 place fields

Devin Ward21 Aug 2025 16:13 UTC
1 point
0 comments1 min readLW link

Could one coun­try out­grow the rest of the world?

Tom Davidson21 Aug 2025 15:32 UTC
45 points
23 comments17 min readLW link
(newsletter.forethought.org)

What is “Mean­ing­ness”

21 Aug 2025 14:57 UTC
11 points
0 comments15 min readLW link

AI #130: Talk­ing Past The Sale

Zvi21 Aug 2025 13:50 UTC
37 points
4 comments60 min readLW link
(thezvi.wordpress.com)

Cri­tiques of FDT Often Stem From Con­fu­sion About New­comblike Problems

Heighn21 Aug 2025 13:19 UTC
7 points
19 comments5 min readLW link

Le­gal Per­son­hood—Con­tracts (Part 1)

Stephen Martin21 Aug 2025 5:23 UTC
10 points
0 comments7 min readLW link

Be­ing hon­est with AIs

Lukas Finnveden21 Aug 2025 3:57 UTC
63 points
6 comments17 min readLW link
(blog.redwoodresearch.org)

ACX Fall Meetup 2025 @ Klang Valley, Malaysia

Yi-Yang21 Aug 2025 3:34 UTC
2 points
0 comments1 min readLW link

French Non-Profit Law: As­so­ci­a­tions are as cool as Amer­i­can churches

Lucie Philippon20 Aug 2025 22:02 UTC
40 points
6 comments3 min readLW link

AI Safety Comms Retreat

Vishakha20 Aug 2025 20:54 UTC
3 points
0 comments1 min readLW link

The trou­ble with “en­light­en­ment”

Gordon Seidoh Worley20 Aug 2025 19:00 UTC
15 points
4 comments4 min readLW link
(uncertainupdates.substack.com)

An epistemic ad­van­tage of work­ing as a moderate

Buck20 Aug 2025 17:47 UTC
217 points
96 comments4 min readLW link

My AGI timeline up­dates from GPT-5 (and 2025 so far)

ryan_greenblatt20 Aug 2025 16:11 UTC
163 points
14 comments4 min readLW link

come work on dan­ger­ous ca­pa­bil­ity miti­ga­tions at Anthropic

Dave Orr20 Aug 2025 15:11 UTC
31 points
7 comments1 min readLW link