De­cep­tion Chan­nel­ing: Train­ing Models to Always Ver­bal­ize Align­ment Faking

Florian_Dietz17 Feb 2026 22:28 UTC
7 points
2 comments9 min readLW link

Rephras­ing Re­duces Eval Aware­ness...

atharva17 Feb 2026 22:23 UTC
23 points
4 comments3 min readLW link

The Math And The Territory

cylonator17 Feb 2026 21:53 UTC
2 points
0 comments8 min readLW link

Words are not dead

William tirkey17 Feb 2026 21:42 UTC
−2 points
2 comments5 min readLW link

Re­view of the Sys­tem The­ory as a Field of Knowledge

siarshai17 Feb 2026 21:34 UTC
4 points
1 comment18 min readLW link

You’re an AI Ex­pert – Not an Influencer

Max Winga17 Feb 2026 21:03 UTC
180 points
25 comments6 min readLW link
(maxwinga.substack.com)

“We are con­fused about agency”

Cole Wyeth17 Feb 2026 19:51 UTC
57 points
37 comments3 min readLW link

Maybe bench­marks should be bro­ken?

Jonathan Gabor17 Feb 2026 19:49 UTC
24 points
2 comments1 min readLW link
(jonathanpgabor.substack.com)

The brain is a ma­chine that runs an algorithm

Steven Byrnes17 Feb 2026 19:36 UTC
114 points
18 comments4 min readLW link

TV De­tec­tor Vans

J Bostock17 Feb 2026 18:29 UTC
57 points
10 comments2 min readLW link

Notes on In­ter­na­tional Klein Blue

jenn17 Feb 2026 17:51 UTC
46 points
0 comments5 min readLW link
(www.jenn.site)

How to fail any­thing: a com­plete guide

Crazy philosopher17 Feb 2026 17:44 UTC
1 point
0 comments4 min readLW link

Su­per­in­tel­li­gence Align­ment Sem­i­nar (1 month fo­cused up­skil­ling)

Mateusz Bagiński17 Feb 2026 17:03 UTC
115 points
13 comments3 min readLW link

The Multi-Agent Minefield: Can LLMs Co­op­er­ate to Avoid Global Catas­tro­phe?

17 Feb 2026 16:55 UTC
14 points
2 comments5 min readLW link

Per­suad­ing Trump of a proper US-China-led AI Treaty

rguerreschi17 Feb 2026 16:37 UTC
9 points
8 comments6 min readLW link

AI Safety via Gen­er­al­iza­tion and Cau­tion: A Re­search Agenda

Benjamin Plaut17 Feb 2026 16:01 UTC
1 point
0 comments14 min readLW link

On Dwarkesh Pa­tel’s 2026 Pod­cast With Elon Musk and Other Re­cent Elon Musk Things

Zvi17 Feb 2026 15:30 UTC
56 points
2 comments26 min readLW link
(thezvi.wordpress.com)

We need a hard­ware mora­to­rium now

KanHar17 Feb 2026 13:23 UTC
11 points
3 comments9 min readLW link

NEST: Nas­cent En­coded Stegano­graphic Thoughts

Artem Karpov17 Feb 2026 7:55 UTC
20 points
8 comments13 min readLW link

[Question] Why did you buy Bit­coin?

NoSignalNoNoise17 Feb 2026 5:20 UTC
11 points
1 comment1 min readLW link

Gyre

vgel17 Feb 2026 0:38 UTC
260 points
24 comments8 min readLW link
(vgel.me)

Words Are A Leaky Abstraction

sonicrocketman16 Feb 2026 22:20 UTC
1 point
0 comments5 min readLW link
(brianschrader.com)

Cor­re­la­tion Does in Fact Im­ply Causation

KaseyMarkel16 Feb 2026 21:17 UTC
5 points
15 comments3 min readLW link

Sealed Pre­dic­tions—A Solu­tion.

george_is_thinking16 Feb 2026 20:59 UTC
11 points
2 comments5 min readLW link

Me­mory De­cod­ing Jour­nal Club: The Song­bird as a Model for the Gen­er­a­tion and Learn­ing of Com­plex Se­quen­tial Behaviors

Devin Ward16 Feb 2026 20:46 UTC
2 points
0 comments1 min readLW link

Con­tra Ca­plan on higher education

Richard_Ngo16 Feb 2026 20:43 UTC
55 points
15 comments7 min readLW link
(www.mindthefuture.info)

Will re­ward-seek­ers re­spond to dis­tant in­cen­tives?

Alex Mallen16 Feb 2026 19:35 UTC
57 points
4 comments10 min readLW link

[Question] What’s Your P(WEIRD)?

RogerDearnaley16 Feb 2026 18:19 UTC
27 points
18 comments9 min readLW link

Es­ti­mat­ing METR Time Hori­zons for Claude Opus 4.6 and GPT 5.3 Codex (xhigh)

CharlesD16 Feb 2026 18:14 UTC
33 points
6 comments3 min readLW link

Char­latan Labyrinth

niplav16 Feb 2026 17:56 UTC
16 points
8 comments1 min readLW link

Jailbreak­ing is Em­piri­cal Ev­i­dence for In­ner Misal­ign­ment and Against Align­ment by Default

Jérémy Andréoletti16 Feb 2026 17:49 UTC
51 points
16 comments2 min readLW link

Break Stasis

Oldmanrahul16 Feb 2026 17:33 UTC
2 points
0 comments2 min readLW link
(oldmanrahul.com)

LLM Self-Ex­pres­sion Through Mu­sic Videos

Josh Snider16 Feb 2026 17:09 UTC
14 points
0 comments7 min readLW link

Towards A Happy Fu­ture With AI Employers

Lukas Petersson16 Feb 2026 17:00 UTC
12 points
0 comments1 min readLW link
(andonlabs.com)

Per­sona Parasitology

Raymond Douglas16 Feb 2026 16:22 UTC
177 points
38 comments11 min readLW link

On Dwarkesh Pa­tel’s 2026 Pod­cast With Dario Amodei

Zvi16 Feb 2026 14:30 UTC
42 points
0 comments16 min readLW link
(thezvi.wordpress.com)

WeirdML Time Horizons

Håvard Tveit Ihle16 Feb 2026 10:25 UTC
90 points
2 comments11 min readLW link

Text Posts from the Kids Group: 2025

jefftk16 Feb 2026 10:00 UTC
15 points
1 comment14 min readLW link
(www.jefftk.com)

build­ing sqlite with a small swarm

kian16 Feb 2026 5:33 UTC
7 points
4 comments1 min readLW link
(kiankyars.github.io)

My ex­pe­rience of the 2025 CFAR Workshop

Cookie penguin16 Feb 2026 3:33 UTC
83 points
4 comments4 min readLW link

Cul­ti­vat­ing Gardens

16 Feb 2026 1:40 UTC
28 points
1 comment22 min readLW link

The World Keeps Get­ting Saved and You Don’t Notice

Bogoed16 Feb 2026 1:01 UTC
210 points
20 comments2 min readLW link

Most Ob­servers Are Alone: The Fermi Para­dox as Default

SE Gyges16 Feb 2026 0:52 UTC
29 points
12 comments4 min readLW link
(segyges.leaflet.pub)

Align­ing to Virtues

Richard_Ngo16 Feb 2026 0:37 UTC
93 points
36 comments4 min readLW link

At­tach Your­self to the Right Per­son, and You’ll Go Far (a nerdy poem about bugs)

Character#273615 Feb 2026 23:36 UTC
10 points
0 comments1 min readLW link

Model mul­ti­task­ing: Can a model learn two differ­ent tasks si­mul­ta­neously through Grokking?

arcee18315 Feb 2026 23:06 UTC
7 points
0 comments9 min readLW link

Phan­tom Trans­fer and the Ba­sic Science of Data Poisoning

15 Feb 2026 19:51 UTC
82 points
8 comments6 min readLW link

Should any­one’s “anal­y­sis” of ex­tremely com­plex sys­tems, such as geopoli­tics, be taken se­ri­ously? or, Does any­one take a 5 year old’s “anal­y­sis” of de­cently com­plex sys­tems, like big city poli­tics, se­ri­ously?

M. Y. Zuo15 Feb 2026 18:44 UTC
18 points
5 comments1 min readLW link

Pain­less Ac­ti­va­tion Steering

Sasha Cui15 Feb 2026 17:49 UTC
14 points
2 comments1 min readLW link
(open.substack.com)

PieArena: Lan­guage Agents Ne­go­ti­at­ing Against Yale MBAs

Sasha Cui15 Feb 2026 17:45 UTC
5 points
0 comments1 min readLW link
(open.substack.com)