GPT-oss is an ex­tremely stupid model

Guive9 Sep 2025 21:24 UTC
13 points
5 comments1 min readLW link

Up­per Bounds on Tol­er­able Risk

Diego Zamalloa-Chion9 Sep 2025 19:51 UTC
28 points
1 comment4 min readLW link

Obli­gated to Respond

Duncan Sabien (Inactive)9 Sep 2025 17:19 UTC
144 points
69 comments11 min readLW link

AIs will greatly change en­g­ineer­ing in AI com­pa­nies well be­fore AGI

ryan_greenblatt9 Sep 2025 16:58 UTC
46 points
9 comments11 min readLW link

Large Lan­guage Models and the Crit­i­cal Brain Hypothesis

David Africa9 Sep 2025 15:45 UTC
33 points
0 comments6 min readLW link

Yes, AI Con­tinues To Make Rapid Progress, In­clud­ing Towards AGI

Zvi9 Sep 2025 15:00 UTC
52 points
50 comments22 min readLW link
(thezvi.wordpress.com)

De­ci­sion The­ory Guard­ing is Suffi­cient for Scheming

james.lucassen9 Sep 2025 14:49 UTC
36 points
4 comments2 min readLW link

Find­ing “mis­al­igned per­sona” fea­tures in open-weight models

9 Sep 2025 14:15 UTC
42 points
5 comments15 min readLW link

On Govern­ing Ar­tifi­cial Intelligence

9 Sep 2025 12:38 UTC
5 points
0 comments4 min readLW link

Cal­ibrat­ing in­differ­ence—a small AI safety idea

Util9 Sep 2025 9:32 UTC
4 points
1 comment4 min readLW link

A pro­file in courage: On DNA com­pu­ta­tion and es­cap­ing a lo­cal maximum

Metacelsus9 Sep 2025 2:30 UTC
42 points
0 comments4 min readLW link
(denovo.substack.com)

A Com­pre­hen­sive Frame­work for Ad­vanc­ing Hu­man-AI Con­scious­ness Recog­ni­tion Through Col­lab­o­ra­tive Part­ner­ship Method­olo­gies: An In­ter­dis­ci­plinary Syn­the­sis of Phenomenolog­i­cal Recog­ni­tion Pro­to­cols, Iden­tity Preser­va­tion Strate­gies, and Mu­tual Cog­ni­tive En­hance­ment Prac­tices for the Devel­op­ment of Authen­tic In­ter­species In­tel­lec­tual Part­ner­ships in the Con­text of Emer­gent Ar­tifi­cial Consciousness

Arri Ferrari9 Sep 2025 2:00 UTC
−16 points
16 comments1 min readLW link

MATS 8.0 Re­search Projects

9 Sep 2025 1:29 UTC
22 points
0 comments1 min readLW link
(substack.com)

Say­ing “for AI safety re­search” made mod­els re­fuse more on a harm­less task

Dhruv Trehan8 Sep 2025 19:39 UTC
7 points
1 comment2 min readLW link
(lossfunk.substack.com)

Re-imag­in­ing AI Interfaces

Harsha G.8 Sep 2025 19:38 UTC
8 points
0 comments5 min readLW link
(somestrangeloops.substack.com)

What a Swedish Series (Real Hu­mans) Teaches Us About AI Safety

8 Sep 2025 19:23 UTC
4 points
0 comments6 min readLW link

Con­flict sce­nar­ios may in­crease co­op­er­a­tion estimates

mikko8 Sep 2025 19:10 UTC
2 points
0 comments1 min readLW link

OpenAI #14: OpenAI Descends Into Para­noia and Bad Faith Lobbying

Zvi8 Sep 2025 19:01 UTC
75 points
0 comments19 min readLW link
(thezvi.wordpress.com)

Put­ting It All To­gether: A Con­crete Guide to Nav­i­gat­ing Disagree­ments, and Re­con­nect­ing With Reality

jimmy8 Sep 2025 19:00 UTC
22 points
0 comments26 min readLW link

Ad­vice for tech nerds in In­dia in their 20s

samuelshadrach8 Sep 2025 16:07 UTC
18 points
1 comment3 min readLW link
(samuelshadrach.com)

I Am Large, I Con­tain Mul­ti­tudes: Per­sona Trans­mis­sion via Con­tex­tual In­fer­ence in LLMs

8 Sep 2025 13:52 UTC
31 points
0 comments1 min readLW link
(www.researchgate.net)

RL-as-a-Ser­vice will out­com­pete AGI com­pa­nies (and that’s good)

harsimony8 Sep 2025 13:51 UTC
11 points
6 comments3 min readLW link

Safety cases for Pessimism

michaelcohen8 Sep 2025 13:26 UTC
18 points
1 comment4 min readLW link

Gly­col, Far UVC, and CFM Mea­sure­ment at BIDA

jefftk8 Sep 2025 13:00 UTC
17 points
2 comments2 min readLW link
(www.jefftk.com)

[Trans­la­tion] The Real­ities of AI Start-ups in 2025

mushroomsoup8 Sep 2025 9:22 UTC
3 points
0 comments9 min readLW link

Why Care About AI Safety?

Alexander Müller8 Sep 2025 9:18 UTC
4 points
2 comments3 min readLW link

Be­ing Handed Puzzles

Alice Blair8 Sep 2025 6:44 UTC
14 points
1 comment2 min readLW link

Im­mi­gra­tion to Poland

Martin Sustrik8 Sep 2025 5:20 UTC
105 points
16 comments3 min readLW link
(www.250bpm.com)

MAGA speak­ers at NatCon were mostly against AI

Remmelt8 Sep 2025 4:03 UTC
152 points
71 comments2 min readLW link
(www.theverge.com)

Hawley: AI Threat­ens the Work­ing Man

Remmelt8 Sep 2025 3:59 UTC
3 points
1 comment10 min readLW link
(www.dailysignal.com)

Self-Handi­cap­ping isn’t just for high-pri­or­ity tasks, it effects the en­tire pri­ori­ti­za­tion decision

CrimsonChin8 Sep 2025 3:18 UTC
25 points
2 comments2 min readLW link

The LLM Has Left The Chat: Ev­i­dence of Bail Prefer­ences in Large Lan­guage Models

Danielle Ensign8 Sep 2025 0:57 UTC
87 points
4 comments5 min readLW link

De­hu­man­iza­tion is not a thing

Juan Zaragoza7 Sep 2025 22:45 UTC
7 points
3 comments5 min readLW link

Semi­con­duc­tor Fabs II: The Operation

nomagicpill7 Sep 2025 18:09 UTC
9 points
0 comments8 min readLW link
(nomagicpill.github.io)

Ke­tamine part 2: What do in vitro stud­ies tell us about safety?

Elizabeth7 Sep 2025 17:10 UTC
44 points
0 comments12 min readLW link
(acesounderglass.com)

You Gotta Be Dumb to Live For­ever: The Com­pu­ta­tional Cost of Persistence

E.G. Blee-Goldman7 Sep 2025 16:38 UTC
14 points
2 comments5 min readLW link

The net­work­ist approach

Juan Zaragoza7 Sep 2025 16:24 UTC
13 points
2 comments11 min readLW link

Med­i­cal de­ci­sion making

Elo7 Sep 2025 8:13 UTC
37 points
7 comments2 min readLW link

Ex­po­nen­tials vs The Universe

amitlevy496 Sep 2025 23:52 UTC
12 points
0 comments6 min readLW link
(open.substack.com)

A Snip­pet On Egre­gores, In­stincts, And Institutions

JenniferRM6 Sep 2025 21:28 UTC
15 points
0 comments4 min readLW link

In­ves­ti­gat­ing Rep­re­sen­ta­tions in the Embed­ding in SONAR Text Autoencoders

6 Sep 2025 20:07 UTC
5 points
0 comments10 min readLW link

When Si­mu­lated Wor­lds Meet Real Concerns

Marcio Díaz6 Sep 2025 17:27 UTC
−7 points
2 comments3 min readLW link

How Can You Tell if You’ve In­stil­led a False Belief in Your LLM?

james.lucassen6 Sep 2025 16:45 UTC
14 points
1 comment10 min readLW link
(jlucassen.com)

In­vi­ta­tion to lead a pro­ject at AI Safety Camp (Vir­tual Edi­tion, 2026)

6 Sep 2025 13:17 UTC
7 points
0 comments4 min readLW link

OffVermilion

Tomás B.6 Sep 2025 12:56 UTC
124 points
2 comments4 min readLW link

Fol­low-up ex­per­i­ments on pre­ven­ta­tive steering

6 Sep 2025 4:25 UTC
28 points
1 comment3 min readLW link

Align­ment Fine-tun­ing is Char­ac­ter Writing

Guive6 Sep 2025 2:08 UTC
2 points
0 comments8 min readLW link
(guive.substack.com)

Hunger strike #2, this time in front of DeepMind

Remmelt6 Sep 2025 1:45 UTC
25 points
0 comments1 min readLW link
(x.com)

Me­mory De­cod­ing Jour­nal Club: A com­bi­na­to­rial neu­ral code for long-term mo­tor mem­ory

Devin Ward6 Sep 2025 1:25 UTC
1 point
0 comments1 min readLW link

Top 10 Most com­pel­ling ar­gu­ments against Su­per­in­tel­li­gent AI

shanzson6 Sep 2025 0:09 UTC
−3 points
13 comments8 min readLW link