RSS

Cover­age-driven al­ign­ment—What ‘Teach­ing Claude Why’ can bor­row from AV verification

Yoav Hollander8 Jun 2026 11:42 UTC
2 points
0 comments14 min readLW link
(blog.foretellix.com)

Bun’s Mi­gra­tion from Zig to Rust as a Po­ten­tial Case Study for Grad­ual Disempowerment

Sayhan Yalvaçer8 Jun 2026 7:06 UTC
−1 points
0 comments3 min readLW link

Men­tal cau­sa­tion is not load-bearing

jessicata7 Jun 2026 20:43 UTC
29 points
2 comments10 min readLW link

How Far Apart Does a Model Think Its To­kens Are?

Brendan Long7 Jun 2026 20:20 UTC
36 points
0 comments9 min readLW link

Au­topi­lot Thinking

XelaP7 Jun 2026 20:20 UTC
10 points
4 comments6 min readLW link

Se­cret Loy­alties Likely Raise Re­mote-Influenceability

Kaustubh Kislay7 Jun 2026 17:51 UTC
11 points
0 comments6 min readLW link

From One Piece to One Pace - Vi­sion and mis­sion in tem­po­rary co­or­di­na­tion of agents

a unemployed pastor- de S Brito7 Jun 2026 17:07 UTC
4 points
0 comments3 min readLW link

Ne­glected Ba­sics of AI Alignment

Quirinus_Quirrell7 Jun 2026 9:02 UTC
26 points
0 comments6 min readLW link

Can ac­ti­va­tion ver­bal­iz­ers sur­face an in­ter­nal chain of thought?

7 Jun 2026 4:24 UTC
92 points
0 comments16 min readLW link

Against Corrigibility

peralice6 Jun 2026 20:28 UTC
58 points
16 comments12 min readLW link

Freud heard a ru­mor that Science ex­isted, and had a won­der­ful dream

Bruce Middleton6 Jun 2026 14:47 UTC
8 points
8 comments6 min readLW link

Coal­i­tional Dar­winism and the In­stru­men­tal Utility of Individuality

CarolusRenniusVitellius6 Jun 2026 12:53 UTC
24 points
5 comments17 min readLW link
(charlesr-w.github.io)

Why Soft­ware Au­toma­tion Is Hard

silentbob6 Jun 2026 8:56 UTC
104 points
19 comments12 min readLW link

What if An­thropic unilat­er­ally paused ca­pa­bil­ities de­vel­op­ment right now?

Karl von Wendt6 Jun 2026 7:39 UTC
51 points
12 comments3 min readLW link

Op­ti­mi­sa­tion over non-sta­tion­ary dis­tri­bu­tions cre­ates weirder minds

6 Jun 2026 0:05 UTC
33 points
1 comment4 min readLW link

[Question] Does robotics ca­pa­bil­ities re­search ac­cel­er­ate AGI timelines?

Master Chief5 Jun 2026 23:32 UTC
4 points
3 comments1 min readLW link

Two More Meth­ods for Con­sis­tency Train­ing and Some New Ways to Ap­ply It

5 Jun 2026 21:06 UTC
18 points
0 comments7 min readLW link

Re­vis­it­ing GSM-Sym­bolic: mod­els seem to rea­son okay, actually

Sturb5 Jun 2026 20:54 UTC
12 points
0 comments5 min readLW link

Ac­cept­ing Death & Adult Responsibility

Unreal5 Jun 2026 19:23 UTC
−19 points
10 comments4 min readLW link

The Masochis­tic Prior

Dev.Roland5 Jun 2026 19:05 UTC
12 points
1 comment2 min readLW link
(substack.com)